An Arrogant Programmer's Adventures in Assembly (and C): April 2017

Sunday, April 16, 2017

Benchmarking My Changes

In my last blog post I explained how I was able to optimize the strfry() functions from glibc, and now I have to test for regressions. I was testing my last results on an x86 virtual machine on my laptop. The issue with that is that we're supposed to be targeting our code for aarch64, so now I need to test on there!

For reference, the code I am using to test both the current glibc's strfry and my own:

#include <string.h>
#include <stdio.h>

#define STR_LEN 1000000

#define LOOP_COUNT 10000

#define CHARSET_SIZE 26

int main()
{
char charset[CHARSET_SIZE];
char text[STR_LEN];
for(int i = 0; i < CHARSET_SIZE; i++){
charset[i] = (char)('a' + i);
}
int i;
for(i = 0; i < STR_LEN;i += CHARSET_SIZE){
memcpy((void*)text + i, charset, CHARSET_SIZE);
}

text[i - CHARSET_SIZE] = '\0';

for(i = 0; i < LOOP_COUNT; i++){
strfry(text);
}
}

To keep my benchmarks consistent with my peers, I will first re-run my tests on Xerxes, our x86_64 server here to use.

Results of running the current glibc strfry on Xerxes:
real 4m15.630s
user 4m15.525s
sys 0m0.003s

And the results for running my implementation on Xerxes:
real 2m54.665s
user 2m54.609s
sys 0m0.005s

This is a roughly 30% runtime improvement, better even then the 5-10% I was getting on my virtual machine! However when tested on Betty (our aarch64 server), the results are less impressive.

The current implementation:
real 3m21.161s
user 3m21.160s
sys 0m0.000s

Mine:
real 3m18.495s
user 3m18.500s
sys 0m0.000s

This is only a roughly 1% improvement, and I can only assume that is because aarch is a highly optimized architecture, and is better able to handle larger sets of data.

You can view my exact code on my spo600 github pull request here

Sunday, April 9, 2017

strfry is totally optimizable

A note: I've always been partial to the idea of "let the code do the talking" and if my last blog post didn't reinforce that, I don't know what does! It was a bit of a mess, and rather rushed.

So I previously remarked that I couldn't optimize strfry. I was wrong. For reference, here is the currently implementation of strfry:

char* strfry (char *string)
{
static int init;
static struct random_data rdata;

if (!init)
{
static char state[32];
rdata.state = NULL;
__initstate_r (time ((time_t *) NULL) ^ getpid (),
state, sizeof (state), &rdata);
init = 1;
}

size_t len = strlen (string);
if (len > 0)
for (size_t i = 0; i < len - 1; ++i)
{
int32_t j;
__random_r (&rdata, &j);
j = j % (len - i) + i;

char c = string[i];
string[i] = string[j];
string[j] = c;
}

return string;
}

As a simple test case, the following code was run:

#define STR_LEN 1000000
#define LOOP_COUNT 0000

#define CHARSET_SIZE 26

int main(){
char charset[CHARSET_SIZE];
char text[STR_LEN];
int i;
//fill the charset
for(i = 0; i < CHARSET_SIZE; i++){
charset[i] = 'a' + i;
}
//fill the text
for(i = 0; i < STR_LEN; i ++ CHARSET_SIZE){
memcpy((void*)text + i, charset, CHARSET_SIZE);
}
text[i - CHARSET_SIZE] = '\0';

for(i = 0; i < LOOP_COUNT; i++){
strfry(text);
}

}

And the results when timed (compiled with gcc, and statically linked the binary for faster code execution):

real 3m3.960s
user 3m3.676s
sys 0m0.020s

Can you see where the function can be optimized? The strlen call is actually unneeded! It iterates though the full string to get the length, and then shuffles the contents of the string, only you don't actually need the length to shuffle the string, we can just iterate till we hit the null character. Our changed loop would look like:

if (string[0])
for (size_t i = 1; string[i]; ++i)
{
int32_t j;
__random_r (&rdata, &j);
j = j % i;

char c = string[i];
string[i] = string[j];
string[j] = c;

}

as you an see, we only removed the function call, changed both the loop condition and the initial if statement, and then changed how we compute the index to swap with. These are all relatively small changes, but do have a rather significant impact when executed. The results when timed with my modification:

real 2m47.227s
user 2m47.024s

sys 0m0.020s

This is an approximate improvement of 5-10%! Now I can go and try to get this reviewed by my fellow classmates, and see what they say!

Monday, April 3, 2017

Optimizing glibc

So for our course's final assignment, we are tasked with finding a function in glibc, and then to optimize it. At first I assumed this project would have been a relatively easy (spoiler: it's not), and decided to leave picking my function to the last minute (literally, I picked it in class when others were presenting on their selected functions). I ended up stumbling upon the function "strstr", a function used to locate the first occurrence of a string inside of another string. After reviewing the contents of the function however I was forced to conclude that I (with my current knowledge) am unable to further optimize it.

Instead I decided to look at the function "strfry" where it shuffles the contents of the string. At first glance I thought it would also be optimizable, but now I'm not so sure. I need to discuss this with our professor.