Sunday, April 16, 2017

Benchmarking My Changes

In my last blog post I explained how I was able to optimize the strfry() functions from glibc, and now I have to test for regressions. I was testing my last results on an x86 virtual machine on my laptop. The issue with that is that we're supposed to be targeting our code for aarch64, so now I need to test on there!

For reference, the code I am using to test both the current glibc's strfry and my own:

#include <string.h>
#include <stdio.h>

#define STR_LEN 1000000

#define LOOP_COUNT 10000

#define CHARSET_SIZE 26

int main()
{
char charset[CHARSET_SIZE];
char text[STR_LEN];
for(int i = 0; i < CHARSET_SIZE; i++){
charset[i] = (char)('a' + i);
}
int i;
for(i = 0; i < STR_LEN;i += CHARSET_SIZE){
memcpy((void*)text + i, charset, CHARSET_SIZE);
}

text[i - CHARSET_SIZE] = '\0';

for(i = 0; i < LOOP_COUNT; i++){
strfry(text);
}
}

To keep my benchmarks consistent with my peers, I will first re-run my tests on Xerxes, our x86_64 server here to use.

Results of running the current glibc strfry on Xerxes:
real 4m15.630s
user 4m15.525s
sys 0m0.003s

And the results for running my implementation on Xerxes:
real 2m54.665s
user 2m54.609s
sys 0m0.005s

This is a roughly 30% runtime improvement, better even then the 5-10% I was getting on my virtual machine! However when tested on Betty (our aarch64 server), the results are less impressive.

The current implementation:
real 3m21.161s
user 3m21.160s
sys 0m0.000s

Mine:
real 3m18.495s
user 3m18.500s
sys 0m0.000s

This is only a roughly 1% improvement, and I can only assume that is because aarch is a highly optimized architecture, and is better able to handle larger sets of data.


You can view my exact code on my spo600 github pull request here

No comments:

Post a Comment