tag:blogger.com,1999:blog-85826333784561791782023-11-15T09:24:35.507-08:00An Arrogant Programmer's Adventures in Assembly (and C)Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-8582633378456179178.post-77173522909633683942017-04-16T14:52:00.003-07:002017-04-16T14:52:59.477-07:00Benchmarking My ChangesIn my last blog post I explained how I was able to optimize the strfry() functions from glibc, and now I have to test for regressions. I was testing my last results on an x86 virtual machine on my laptop. The issue with that is that we're supposed to be targeting our code for aarch64, so now I need to test on there!<br />
<br />
For reference, the code I am using to test both the current glibc's strfry and my own:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">#include <string.h></span><br />
<span style="font-family: Courier New, Courier, monospace;">#include <stdio.h></span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">#define STR_LEN 1000000</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">#define LOOP_COUNT 10000</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">#define CHARSET_SIZE 26</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">int main()</span><br />
<span style="font-family: Courier New, Courier, monospace;">{</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char charset[CHARSET_SIZE];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char text[STR_LEN];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(int i = 0; i < CHARSET_SIZE; i++){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>charset[i] = (char)('a' + i);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int i;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < STR_LEN;i += CHARSET_SIZE){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>memcpy((void*)text + i, charset, CHARSET_SIZE);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>text[i - CHARSET_SIZE] = '\0';</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < LOOP_COUNT; i++){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>strfry(text);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<br />
To keep my benchmarks consistent with my peers, I will first re-run my tests on Xerxes, our x86_64 server here to use.<br />
<br />
Results of running the current glibc strfry on Xerxes:<br />
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>4m15.630s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>4m15.525s</span><br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.003s</span><br />
<br />
And the results for running my implementation on Xerxes:<br />
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>2m54.665s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>2m54.609s</span><br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.005s</span><br />
<br />
This is a roughly 30% runtime improvement, better even then the 5-10% I was getting on my virtual machine! However when tested on Betty (our aarch64 server), the results are less impressive.<br />
<br />
The current implementation:<br />
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>3m21.161s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>3m21.160s</span><br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</span><br />
<br />
Mine:<br />
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>3m18.495s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>3m18.500s</span><br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.000s</span><br />
<br />
This is only a roughly 1% improvement, and I can only assume that is because aarch is a highly optimized architecture, and is better able to handle larger sets of data.<br />
<br />
<br />
You can view my exact code on my spo600 github pull request <a href="https://github.com/ctyler/spo600-glibc/pull/2" target="_blank">here</a>Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-83480636824643615552017-04-09T13:58:00.000-07:002017-04-09T13:58:06.123-07:00strfry is totally optimizableA note: I've always been partial to the idea of "let the code do the talking" and if my last blog post didn't reinforce that, I don't know what does! It was a bit of a mess, and rather rushed.<br />
<br />
So I previously remarked that I couldn't optimize strfry. I was wrong. For reference, here is the currently implementation of strfry:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">char* </span><span style="font-family: "Courier New", Courier, monospace;">strfry (char *string)</span><br />
<span style="font-family: Courier New, Courier, monospace;">{</span><br />
<span style="font-family: Courier New, Courier, monospace;"> static int init;</span><br />
<span style="font-family: Courier New, Courier, monospace;"> static struct random_data rdata;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> if (!init)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> {</span><br />
<span style="font-family: Courier New, Courier, monospace;"> static char state[32];</span><br />
<span style="font-family: Courier New, Courier, monospace;"> rdata.state = NULL;</span><br />
<span style="font-family: Courier New, Courier, monospace;"> __initstate_r (time ((time_t *) NULL) ^ getpid (),</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span> state, sizeof (state), &rdata);</span><br />
<span style="font-family: Courier New, Courier, monospace;"> init = 1;</span><br />
<span style="font-family: Courier New, Courier, monospace;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> size_t len = strlen (string);</span><br />
<span style="font-family: Courier New, Courier, monospace;"> if (len > 0)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> for (size_t i = 0; i < len - 1; ++i)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> {</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int32_t j;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>__random_r (&rdata, &j);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>j = j % (len - i) + i;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char c = string[i];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>string[i] = string[j];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>string[j] = c;</span><br />
<span style="font-family: Courier New, Courier, monospace;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> return string;</span><br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;">As a simple test case, the following code was run:</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">#define STR_LEN 1000000</span><br />
<span style="font-family: Courier New, Courier, monospace;">#define LOOP_COUNT 0000</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">#define CHARSET_SIZE 26</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">int main(){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char charset[CHARSET_SIZE];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char text[STR_LEN];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int i;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>//fill the charset</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < CHARSET_SIZE; i++){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>charset[i] = 'a' + i;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>//fill the text</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < STR_LEN; i ++ CHARSET_SIZE){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>memcpy((void*)text + i, charset, CHARSET_SIZE);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>text[i - CHARSET_SIZE] = '\0';</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < LOOP_COUNT; i++){</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>strfry(text);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
And the results when timed (compiled with gcc, and statically linked the binary for faster code execution):<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>3m3.960s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>3m3.676s</span><br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.020s</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Can you see where the function can be optimized</span><span style="font-family: inherit;">? The strlen call is actually unneeded! It iterates though the full string to get the length, and then shuffles the contents of the string, only you don't actually need the length to shuffle the string, we can just iterate till we hit the null character. Our changed loop would look like:</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> if (string[0])</span><br />
<span style="font-family: Courier New, Courier, monospace;"> for (size_t i = 1; string[i]; ++i)</span><br />
<span style="font-family: Courier New, Courier, monospace;"> {</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int32_t j;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>__random_r (&rdata, &j);</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>j = j % i;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>char c = string[i];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>string[i] = string[j];</span><br />
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>string[j] = c;</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;"> }<br /></span><br />
<span style="font-family: inherit;">as you an see, we only removed the function call, changed both the loop </span>condition and the initial if statement<span style="font-family: inherit;">, and then changed how we compute the index to swap with. These are all </span>relatively<span style="font-family: inherit;"> small changes, but do have a rather significant impact when executed. The results when timed with my modification:</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">real<span class="Apple-tab-span" style="white-space: pre;"> </span>2m47.227s</span><br />
<span style="font-family: Courier New, Courier, monospace;">user<span class="Apple-tab-span" style="white-space: pre;"> </span>2m47.024s</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">sys<span class="Apple-tab-span" style="white-space: pre;"> </span>0m0.020s</span><br />
<br />
This is an approximate improvement of 5-10%! Now I can go and try to get this reviewed by my fellow classmates, and see what they say!Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-67216025072641254132017-04-03T13:57:00.000-07:002017-04-03T13:57:32.034-07:00Optimizing glibcSo for our course's final assignment, we are tasked with finding a function in glibc, and then to optimize it. At first I assumed this project would have been a relatively easy (spoiler: it's not), and decided to leave picking my function to the last minute (literally, I picked it in class when others were presenting on their selected functions). I ended up stumbling upon the function "strstr", a function used to locate the first occurrence of a string inside of another string. After reviewing the contents of the function however I was forced to conclude that I (with my current knowledge) am unable to further optimize it.<br />
<br />
Instead I decided to look at the function "strfry" where it shuffles the contents of the string. At first glance I thought it would also be optimizable, but now I'm not so sure. I need to discuss this with our professor.Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-86284171003775371612017-03-04T21:49:00.000-08:002017-03-04T21:49:04.559-08:00SPO600 - Lab 6 - Vectorization LabWithin this lab we were tasked to write some code to be auto-vectorized, and then analyze the disassembled machine code. The following code may seem to be inefficient, but I had to do it this way to insure that only one section of the code were to get vectorized. For reference, this code was compiled using: "gcc -O3 -g -o lab6 lab6.c"<br />
<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <stdlib.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <stdio.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">#define SIZE 1000</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#define ARR_SIZE sizeof(int) * SIZE</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">int main(){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int* arr1 = malloc(ARR_SIZE);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int* arr2 = malloc(ARR_SIZE);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int* sum = malloc(ARR_SIZE);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> long long finalSum = 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> size_t i;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < SIZE; i++) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> arr1[i] = rand();</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < SIZE; i++) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> arr2[i] = rand();</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < SIZE; i++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> sum[i] = arr1[i] + arr2[i];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < SIZE; i++) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> finalSum += sum[i];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> printf("Final sum:%d\n", finalSum);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> free(arr1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> free(arr2);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> free(sum);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><br />
In the past I've broken the code down into parts, and then explained each part, however the disassembled code here is much longer than it was in the past, and much of this code doesn't have vectorized code, so we're just going to look at the code in the 3rd loop, as it's relatively simple, and we only have to analyze a small section of it to understand how the vector operations work!<br />
Here is the main loop for calculating the sum array:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;"> 40062c: d2800001 mov x1, #0x0</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> 400630: 4cdf7861 ld1 {v1.4s}, [x3], #16</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 400634: 4cdf7880 ld1 {v0.4s}, [x4], #16</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 400638: 4ea08420 add v0.4s, v1.4s, v0.4s</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 40063c: 91000421 add x1, x1, #0x1</span><br />
<span style="font-family: Courier New, Courier, monospace;"> 400640: 4c9f7840 st1 {v0.4s}, [x2], #16</span><br />
<br />
Without any context it can be rather difficult to tell what this code does. The first line sets x1 to be zero, where the second one and the third one load the values from arr1 and arr2 into vector registers, (16/register_size integers. So 4 integers get loaded). Then each vector set gets added, and stored into v0.4s. We then increment the loop counter by one, and store the results of our addition into memory (x2 being the sum array).<br />
<br />
As you can see, using vector operations generate more complicated code, but it is also much faster then doing plain addition on each iteration.<br />
<br />
<br />
If you would like to see the disassembled and annotated code for the main function:<br />
<a href="http://pastebin.com/uiZJK3XM" target="_blank">Click here</a><br />
Please note I only documented up to the first bit of vectorization code, by that point I was able to understand the rest of the code's body rather well.Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-46447079836313032862017-02-18T18:57:00.001-08:002017-02-18T18:57:39.396-08:00SPO600 - Lab 5 - Algorithm Selection LabFor this lab we were required to write a program to multiply a large set of 16-bit signed integers by one floating point value, between 0.0 and 1.0. This simulates how volume scaling works, and it was used to show how the code we write is optimized by the compiler, and also how you can format your code for the compiler to better optimize it.<br />
<br />
The lab itself asked us to write two separate implementations, a naive implementation, and a potentially more optimized solution, where we chose to write one using a lookup table. To assist us with our benchmarking, we created a struct called "Result", and it looks something like this:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">struct Result{</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> double elapsed, elapsed_middle;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> struct timeval time_start,time_end;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> double sum;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">};</span><br />
<br />
<br />
Our naive implementation was about as efficient as expected, taking 2768ms to process a set of 500,000,000 pseudo-random data (from /dev/urandom, and using gcc's -O3 flag). Whereas our custom implementation takes 1852ms. That's a difference of 916ms, or a 33% increase in speed. When compiling without any optimizations, our naive function takes 18080ms, whereas our custom one takes 11756ms, or an improvement of 6324ms, or a 65% improvement.<br />
<br />
For reference, here is our naive and custom functions:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">struct Result sum_custom() {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int i;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> struct Result res;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int16_t table[0xFFFF];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int idx;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> gettimeofday(&res.time_start, NULL);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> res.sum = 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = -32767; i <=32767;i++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> table[i & 0xFFFF] = i * VOL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> int sz = SIZE;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < sz ; i++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> idx = (unsigned short) data[i];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> res.sum += output[i] = table[idx];</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> gettimeofday(&res.time_end, NULL);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> return res;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">struct Result sum_naive(){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> size_t i;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> struct Result res;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> int16_t val;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> gettimeofday(&res.time_start, NULL);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> res.sum = 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> for(i = 0; i < SIZE;i++){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> val = data[i] * VOL;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> output[i] = val;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> res.sum += val; // we're adding here to prevent the compiler from optimizing</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>// our whole loop out.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> }</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> gettimeofday(&res.time_end, NULL);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> return res;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-79153348532610790902017-01-31T18:20:00.002-08:002017-01-31T18:20:19.906-08:00SPO600 - Lab 4 - Compiled C CodeWithin this lab, we looked at how various flags provided to the GNU C Compiler can affect the compiled binary, along with how changes in our code can affect the compiled binary.<br />
<br />
<h3>
The -static Flag</h3>
<div>
The -static flag tells GCC to not generate a link table, instead it embeds all called functions into the compiled binary itself. In turn that makes the binary grow exponentially, as the compiler doesn't know what functions the library function calls, so the whole library gets loaded into the binary as well.</div>
<div>
<br /></div>
<h3>
The -fno-builtin Flag</h3>
<div>
As the title implies here, this tells GCC to not optimize calls to the built-in functions. In class this was shown using printf. Without the flag the compiler replaces calls to printf (that only have one parameter) with putc. putc doesn't check the string for any patterns, so the code should run much faster that it would when printf is called.</div>
<div>
<br /></div>
<h3>
The -g Flag</h3>
<div>
The -g flag tells the GCC to insert debugging sections into the binary, and in turn that makes the binary much larger. It adds a few more sections into the binary, namely the .debug_info, the .debug_types, and .debug_str sections. The binary also contains a line number table, to allow debuggers to map the compiled operations to their C code counterparts, along with other info containing variable names, and their types, effectively allowing debuggers to reconstruct source code from the assembly code.</div>
<div>
<br /></div>
<h3>
printf(), With Many Arguments</h3>
<div>
In this section, we didn't actually change any flag provided to GCC, instead we simply kept adding more parameters to printf to see what happens. It turns out that after 7 arguments, the compiler starts pushing them onto the stack instead. This is because there are only so many registers in the processor itself, and functions with that many arguments tend to be rare, so it would be wasteful to reserve more registers for parameters.</div>
<div>
<br /></div>
<h3>
Calling A Simple Function, With The -O0 Flag</h3>
<div>
Here we had to compare a binary where printf() was called directly and another one where it was called in another function, and we just called that function. When the code was compiled, in main printf() doesn't exist, instead it called output(), and output() contained the printf() call instead.</div>
<div>
<br /></div>
<h3>
Calling A Simple Function, With The -O3 Flag</h3>
<div>
Here we did the exact same as above, but instead of compiling with no optimizations, we compiled with some. This time main() doesn't have a call to output() at all, instead it has the printf() call. Interestingly, the compiled binary still contains the original output() functions. I assume that is in case your code is a library of some sorts, and you need to keep all functions in case some external binary needs them.</div>
Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-49171749683942335122017-01-31T17:40:00.000-08:002017-01-31T17:40:24.498-08:00SPO600 - Lab 3 - Assembly LabWithin this lab we were tasked with writing code that prints the numbers 0 to 99 in both x86_64 and aarch64 (arm64) assembly code. However in this write up, I will only focus on the aarch64 assembly code, as both topics overlap.<br />
<br />
In C a task like this is simple, and can be done in 8 lines of code, including formatting.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <stdio.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">int main() {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int i;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < 100; i++) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>printf("Loop: %d\n", i);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> return 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<br />
While in both x86_64 and AArch 64 it's much more complex. We do not have direct access to the C standard library (where printf is located), so we need to instead invoke something known as a "syscall" to display our text. A syscall is effectively a function provided by your operating system's kernel, and we need to use one to display to the console. In C the code would then look like:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">#include <unistd.h></span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#define STDOUT 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">#define ZERO_ASCII 48</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">int main() {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>int i;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>for(i = 0; i < 100; i++) {</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>write(STDOUT, "Loop: ", 6);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>if(i > 9){</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>write(STDOUT, (i / 10) + ZERO_ASCII, 1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>write(STDOUT, (i % 10) + ZERO_ASCII</span><span style="font-family: "courier new" , "courier" , monospace;">, 1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>write(STDOUT, "\n", 1);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> return 0;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">}</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: inherit;">While this code is more complex, as it has to format the number it is still rather simple, and only 15 lines of code. In AArch64 assembly, a program that does the same thing would look like:</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">.text</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">.globl _start</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">start = 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">max = 100</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">_start:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">/*setup initial loop counter */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">mov x19, start</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">loop:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">/* Start loop here */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> /* Print the Loop string */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1 /* file descriptor: 1 is stdout */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, loop_msg /* message location (memory address) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, loop_msg_len /* message length (bytes) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64 /* write is syscall #64 */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0 /* invoke syscall */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x20, num_msg_len</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> udiv x21, x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> cmp x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> b.lt skip</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1 /* file descriptor: 1 is stdout */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, num_msg /* message location (memory address) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x1, x1, x21 /* add the loop count */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, 1 /* message length (bytes) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">skip:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> msub x21, x20, x21, x19</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, num_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x1, x1, x21</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> /* Print newline */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, nl_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, nl_msg_len</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> /* Increment loop */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x19, x19, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> /* compare the loop counter (x19) to the max value */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> cmp x19, max</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> /* branch if less then the max */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> b.lt loop</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 0 /* status -> 0 */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 93 /* exit is syscall #93 */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0 /* invoke syscall */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">.data</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">loop_msg: .ascii "Loop: "</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">loop_msg_len = . - loop_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">num_msg: .ascii "0123456789"</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">num_msg_len = . - num_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">nl_msg: .ascii "\n"</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">nl_msg_len = . - nl_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: inherit;">As you can see, the amount of code required to complete the same task is far bigger, but I'll break down each section of code.</span><br />
<div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">_start:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">mov x19, start</span><br />
<div>
<br /></div>
This section of the code is relatively easy to understand, as it simply store the number of times we are going to loop into register 19. You can think of it like the 'i' variable from the C program. The next section of code is the body of the loop, but we're only going to look at half of it for now.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">loop:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1 /* file descriptor: 1 is stdout */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, loop_msg /* message location (memory address) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, loop_msg_len /* message length (bytes) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64 /* write is syscall #64 */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0 /* invoke syscall */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x20, num_msg_len</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> udiv x21, x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> cmp x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> b.lt skip</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, num_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x1, x1, x21</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: inherit;">We can then break this section down more, info the first part of the code.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1 /* file descriptor: 1 is stdout */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, loop_msg /* message location (memory address) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, loop_msg_len /* message length (bytes) */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64 /* write is syscall #64 */</span><br />
<span style="font-family: inherit;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0 /* invoke syscall */</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
This section of the code is equivalent to the first call to write() from the C code, where it prints "Loop: ". x0 contains the first parameter (1 is STDOUT), x1 contains the second parameter (the pointer to the block of memory we are printing), and x2 contains the third parameter (number of bytes we are writing). We then store the number 64 in x8, as it is the id for the write syscall, that is invoked on the next line.<br />
<br />
At this point I would like to note that it is possible to print the whole line with one syscall by buffering it ahead of time, instead we chose to do each print separately, as it was easier to model if after our C example, and as an added benefit we were able to easily extend our code to be able to write values in formats other then base-10. All you would have to do would be add more characters to the "num_msg" buffer.<br />
<br />
<br />
This next section of the loop body is where we print the first digit of the number, if it exists.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x20, num_msg_len</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> udiv x21, x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> cmp x19, x20</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> b.lt skip</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, num_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x1, x1, x21</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<br />
<br />
We first load the number of characters in our charset into x20, and then divide x19 (our current loop count) by x20 (number of characters) and store the result int x21 (the quotient of the division). We then check to see if our current loop count is less then the charset size, and if it is we skip printing the value (as it would be 0), otherwise we print the number, in much the same way as before, only we do pointer arithmetic instead. By adding the digit to the num_msg pointer, we are able to single out an individual character to be printed from our set.<br />
<br />
<br />
Now we're going to look at the section of the code where we print the last digit. This code will be executed every iteration, no matter what.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">skip:</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> msub x21, x20, x21, x19</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, num_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> add x1, x1, x21</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"> mov x8, 64</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> svc 0</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;"> mov x0, 1</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> adr x1, nl_msg</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"></span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> mov x2, nl_msg_len</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: inherit;">This section is almost the same as the previous one, with only one difference, that being the msub call. It gets the remainder of dividing the loop counter by the charset size, and then prints it out. After that our code then prints out the newline.</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">The last section of code we need to look at is where we increment the loop counter, and actually check to see if we need to loop again.</span></div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> add x19, x19, 1</span><br />
<span style="font-family: "courier new", courier, monospace;"> cmp x19, max</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> b.lt loop</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: inherit;">This is relatively easy to understand, as we just increment the loop counter (x19) by one, then we compare it with the max value. If it's less then the max, we jump to the beginning of the loop, and run again</span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;"><br /></span></div>
<div>
<span style="font-family: inherit;">To conclude this blog post, it has been an interesting introduction to assembly, and I look </span>forward<span style="font-family: inherit;"> to writing more.</span></div>
Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0tag:blogger.com,1999:blog-8582633378456179178.post-43272811424977567752017-01-21T15:17:00.000-08:002017-01-21T15:17:29.515-08:00Compiling SoftwareI decided to compile wget (as the GNU software to compile), and the apache httpd webserver.
Compiling wget was a breeze, all I had to do was run the configure script, and then make it. Apache was much the same, other then providing a prefix for where the compiled server will be installed do.Matthew Bellhttp://www.blogger.com/profile/13557505617294882541noreply@blogger.com0