Sunday, April 9, 2017

strfry is totally optimizable

A note: I've always been partial to the idea of "let the code do the talking" and if my last blog post didn't reinforce that, I don't know what does! It was a bit of a mess, and rather rushed.

So I previously remarked that I couldn't optimize strfry. I was wrong. For reference, here is the currently implementation of strfry:

char* strfry (char *string)
{
  static int init;
  static struct random_data rdata;

  if (!init)
    {
      static char state[32];
      rdata.state = NULL;
      __initstate_r (time ((time_t *) NULL) ^ getpid (),
    state, sizeof (state), &rdata);
      init = 1;
    }

  size_t len = strlen (string);
  if (len > 0)
    for (size_t i = 0; i < len - 1; ++i)
      {
int32_t j;
__random_r (&rdata, &j);
j = j % (len - i) + i;

char c = string[i];
string[i] = string[j];
string[j] = c;
      }

  return string;
}

As a simple test case, the following code was run:

#define STR_LEN 1000000
#define LOOP_COUNT 0000

#define CHARSET_SIZE 26

int main(){
char charset[CHARSET_SIZE];
char text[STR_LEN];
int i;
//fill the charset
for(i = 0; i < CHARSET_SIZE; i++){
charset[i] = 'a' + i;
}
//fill the text
for(i = 0; i < STR_LEN; i ++ CHARSET_SIZE){
memcpy((void*)text + i, charset, CHARSET_SIZE);
}
text[i - CHARSET_SIZE] = '\0';

for(i = 0; i < LOOP_COUNT; i++){
strfry(text);
}

}

And the results when timed (compiled with gcc, and statically linked the binary for faster code execution):

real 3m3.960s
user 3m3.676s
sys 0m0.020s


Can you see where the function can be optimized? The strlen call is actually unneeded! It iterates though the full string to get the length, and then shuffles the contents of the string, only you don't actually need the length to shuffle the string, we can just iterate till we hit the null character. Our changed loop would look like:

  if (string[0])
    for (size_t i = 1; string[i]; ++i)
      {
int32_t j;
__random_r (&rdata, &j);
j = j % i;

char c = string[i];
string[i] = string[j];
string[j] = c;

      }

as you an see, we only removed the function call, changed both the loop condition and the initial if statement, and then changed how we compute the index to swap with. These are all relatively small changes, but do have a rather significant impact when executed. The results when timed with my modification:

real 2m47.227s
user 2m47.024s

sys 0m0.020s

This is an approximate improvement of 5-10%! Now I can go and try to get this reviewed by my fellow classmates, and see what they say!

No comments:

Post a Comment