In my last blog post I explained how I was able to optimize the strfry() functions from glibc, and now I have to test for regressions. I was testing my last results on an x86 virtual machine on my laptop. The issue with that is that we're supposed to be targeting our code for aarch64, so now I need to test on there!
For reference, the code I am using to test both the current glibc's strfry and my own:
#include <string.h>
#include <stdio.h>
#define STR_LEN 1000000
#define LOOP_COUNT 10000
#define CHARSET_SIZE 26
int main()
{
char charset[CHARSET_SIZE];
char text[STR_LEN];
for(int i = 0; i < CHARSET_SIZE; i++){
charset[i] = (char)('a' + i);
}
int i;
for(i = 0; i < STR_LEN;i += CHARSET_SIZE){
memcpy((void*)text + i, charset, CHARSET_SIZE);
}
text[i - CHARSET_SIZE] = '\0';
for(i = 0; i < LOOP_COUNT; i++){
strfry(text);
}
}
To keep my benchmarks consistent with my peers, I will first re-run my tests on Xerxes, our x86_64 server here to use.
Results of running the current glibc strfry on Xerxes:
real 4m15.630s
user 4m15.525s
sys 0m0.003s
And the results for running my implementation on Xerxes:
real 2m54.665s
user 2m54.609s
sys 0m0.005s
This is a roughly 30% runtime improvement, better even then the 5-10% I was getting on my virtual machine! However when tested on Betty (our aarch64 server), the results are less impressive.
The current implementation:
real 3m21.161s
user 3m21.160s
sys 0m0.000s
Mine:
real 3m18.495s
user 3m18.500s
sys 0m0.000s
This is only a roughly 1% improvement, and I can only assume that is because aarch is a highly optimized architecture, and is better able to handle larger sets of data.
You can view my exact code on my spo600 github pull request here
No comments:
Post a Comment