BalaBit blog

GUARDING YOUR BUSINESS

IOCCC vs Clean Code

Sunday, November 27, 2011 @ 10:11 PM Author: athos

I found this piece of code at the website of the International Obfuscated C Code Contest. (Did you know the contest is open for this year?) Since I read tons of books, papers and even videos about clean code nowadays, I couldn’t resist refactoring it, just to see how much a heavy code cleanup can improve a source code that was intentionally written to be obscrure as possible, and of course, to abuse both IOCCC and Clean Code as hard as can be. :-)

The original code

main(n,i,a,m){while(i=++n) for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;);}

To compile it using GCC, the following command can be used:

$ gcc -ansi -o makarios makarios.c

The only thing it does is slowly printing a bunch of numbers to stdout in an infinite loop. Note that this is pretty much all the information that can be extracted from the code at first blink, either by running it or by reading it.

Refactoring without tests is suicide

Before refactoring it, I needed to make it testable. Since this is a very simple program, I decided to test it end-to-end, by saving a reference output to a textfile (say, for example, the first 150 numbers it prints) and by breaking the infinite loop (of course it would terminate once n overflows, but that’s way too slow):

main(n,i,a,m){while(i=++n) #define STOP_AT 522233 main(n,i,a,m){while((i=++n) <= STOP_AT) for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;);}

Note that it prints numbers in octal base, and 522233 is the 150th number in the reference output in decimal base. It could’ve been written in octal, but in the end I’d like to get rid of this hard-coded constant, so it will do the job well for now. Let’s see if it still generates the expected output:

$ gcc -ansi -Wall -o ./makarios makarios.c && ./makarios 2>&1 | diff expected_output.txt - && echo OK

After each and every change, I ran the test and created a patch only if the test passed. (Find the individual patches at GitHub.)

Recovering structure

So do you have any idea how this code works? Me neither. But there are a couple of things to note:

  • Some indentation might come handy in the future.
  • Local variables are defined as arguments to main(), OMG!
  • The output depends on n (which will always be the number of command line arguments to the program), but all the other variables are overwritten sooner or later, before used. (a and i are trivial to see, and since i is always positive and a is 0, a<i will always be true in the beginning of the inner loop, so the positive branch of the ternary is executed first, which means m gets overwritten as well).
  • Easy to see by experiment, this program does not print numbers smaller than the number of command line arguments (note that the output is in octal base), so if one would want to see only numbers greater than eg. 64 (100 in octal base), the program should be invoked like this:
    ./makarios `seq 1 64`

    This is not very useful, so I decided to drop this feature. Variable n will start from 1, like when the program is invoked without arguments (making the first parameter of main() be 1).

  • There are a couple of compiler warnings, these are better cleaned up so we can see if we mess up something.

After a couple of minutes, my code looked like this:

#include <stdio.h> #define STOP_AT 522233 main(n,i,a,m){while((i=++n) <= STOP_AT) for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;);} int main(int argc, char* argv[]) {  int n = 1,  i, a, m;  while((i=++n) <= STOP_AT)  {  for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;)  {  }  }  return 0; }

Next I converted the for loop into a while loop and moved a and m inside the outer loop since they are not used outside of it:

#include <stdio.h> #define STOP_AT 522233 int main(int argc, char* argv[]) { int n = 1, i, a, m;  i; while((i=++n) <= STOP_AT) { for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;)  int a = 0,  m;  while (a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m) { } } return 0; } 

Now if that ugly expression in the condition of the loop would be inside the block, it could be easily broken up:

 int a = 0, m;  m, is_not_ready; while (a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m) } do {  is_not_ready = a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m; } while (is_not_ready); 

First, replace the ternary with an if statement:

do { is_not_ready = a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;  if (a < i)  {  is_not_ready = a=a*8+i%8,i/=8,m=a==i|a/8==i,1;  }  else  {  is_not_ready = (n-++m||printf("%on",n))&&n%m;  } } while (is_not_ready); 

Notice that I did not even try to understand the hows and whys. The only thing I did so far was to analyze the code, breaking up expressions, replacing them to more readable alternatives, but I have no ghost of an idea what this code does and why it does it so. Let’s continue the work by breaking up the assignment in the true branch of the if statement (from now on, a good cheat sheet of operator precedence will be a life saver):

if (a < i) { is_not_ready = a=a*8+i%8,i/=8,m=a==i|a/8==i,1; a = a*8 + i%8; i /= 8; m = (a == i) | (a/8 == i); is_not_ready = 1; } else { is_not_ready = (n-++m||printf("%on",n))&&n%m; } 

Notice the value assigned to m? Those are two booleans with a binary OR in the middle. That’s ugly, using a logical operator instead will change nothing:

m = (a == i) | (a/8 == i); m = (a == i) || (a/8 == i); 

Now breaking up the assignments inside the else block: printf() will always return non-zero making the expression inside parenthesis true, so is_not_ready depends only from n%m. Since it’s a boolean expression, I added a comparison to zero:

is_not_ready = (n-++m||printf("%on",n))&&n%m; (n-++m||printf("%on",n)); is_not_ready = 0 != n%m; 

Due to lazy evaluation, printf() is only executed when n-++m equals to zero. To make that expressed in code, an if statement can be used instead of the ignored boolean expression:

(n-++m||printf("%on",n)); if (0 == n-++m)  printf("%on", n); is_not_ready = 0 != n%m;

To make the condition more readable:

if (0 == n-++m) if (++m == n) printf("%on", n); 

So far, so good:

 #include <stdio.h> #define STOP_AT 522233 int main(int argc, char* argv[]) { int n = 1, i; while((i=++n) <= STOP_AT) { int a = 0, m, is_not_ready; do { if (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); is_not_ready = 1; } else { if (++m == n) printf("%on", n); is_not_ready = 0 != n%m; } } while (is_not_ready); } return 0; } 

This is kinda readable, isn’t it? No! This is not much better than it was in its 105 bytes size.

Let’s do some Command-query separation here by extracting state-changing operations from conditionals.

if (++m == n) ++m; if (m == n) printf("%on", n); 

Pretty straightforward. Now ++n and the assignment to i require slightly more work:

int main(int argc, char* argv[]) { int n = 1, i;  int n = 2; while((i=++n) <= STOP_AT)  while(n <= STOP_AT) { int a = 0,  i = n, m, is_not_ready; do { if (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); is_not_ready = 1; } else { ++m; if (m == n) printf("%on", n); is_not_ready = 0 != n%m; } } while (is_not_ready);  ++n; } return 0; }

Actually the outer loop can be written shorter as a for loop:

int main(int argc, char* argv[]) { int n = 2;  int n; while(n <= STOP_AT)  for (n = 2; n <= STOP_AT; ++n) { int a = 0, i = n, m, is_not_ready; do { if (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); is_not_ready = 1; } else { ++m; if (m == n) printf("%on", n); is_not_ready = 0 != n%m; } } while (is_not_ready); ++n; } return 0; }

Now we can get rid of is_not_ready, it was just a temporary helper to break up some magical expressions:

 for (n = 2; n <= STOP_AT; ++n) { int a = 0, i = n, m, is_not_ready;  m; do { if (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); is_not_ready = 1; } else { ++m; if (m == n) printf("%on", n); is_not_ready = 0 != n%m;  if (0 == n%m)  break; } } while (is_not_ready);  } while (1); }

For an infinite loop, it does not make much difference if it’s a post-test or a pre-test:

do while (1) { if (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); } else { ++m; if (m == n) printf("%on", n); if (0 == n%m) break; } } while (1); }

If you look at carefully, that’s two inner loops for the price of one:

  • Variable a is starting from 0 and is compared to i which is always positive. This implies that in every iteration of the outer loop the positive branch of the if statement is executed first.
  • The negative branch of the if statement changes neither variables read by the positive branch, nor those used in the condition.
  • In other words, once the negative branch is executed during an iteration of the outer loop, the positive one is never executed again in that iteration.
  • These together imply that in every iteration of the outer loop, the positive branch of the if statement is executed some times, then the negative branch, which finally exits from the inner loop. I.e. during execution, practically there are two separate loops.
  • Separating them in code as well makes understanding the behavior of the program easier.
while (1) { if (a < i) while (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); } else while (1) { ++m; if (m == n) printf("%on", n); if (0 == n%m) break; } } 

Now the last compiler warning can be easily eliminated once we notice that m is not used inside the first inner loop, but it’s always overwritten before evaluating the loop condition again, which means it can be extracted to be after the loop (knowing that for every possible values of i and a the loop iterates at least one, so m is never uninitialized before the second inner loop).

while (a < i) { a = a*8 + i%8; i /= 8; m = (a == i) || (a/8 == i); } m = (a == i) || (a/8 == i); 

Now let’s take care of the second inner loop! Reverting back to a post-test loop (getting rid of always true conditional):

while (1) do { ++m; if (m == n) printf("%on", n);  if (0 == n%m) break; } } while (0 != n%m); 

Notice that conditional printing can be extracted because the loop will either exit before the condition of printing could become true or increment m until it reaches n, but when the latter happens, the loop will break anyways.

do { ++m; if (m == n) printf("%on", n); } while (0 != n%m); if (m == n)  printf("%on", n); 

Now m is used in two roles: sometimes it holds a boolean value, sometimes it’s used in arithmetical expressions. This violates Single Responsibility Principle, so let’s get rid of one of the roles. Notice that when m is 0, the do-while loop will iterate only once, and then no printing will happen since n is always greater than 1 and m will be equal to 1, and 1 is a divisor of any n. Which means we can skip the do-while loop when the magical expression after the first inner loop is false, and start m from 1 when it’s not:

m = (a == i) || (a/8 == i); if (!((a == i) || (a/8 == i)))  continue; m = 1; 

There’s one more tiny thing disturbing me: I don’t really like post-test loops when they’re used unnecessarily, so let’s get rid of this one:

m = 1; do m = 2; while (0 != n%m) { ++m; } while (0 != n%m); } 

The code so far:

#include <stdio.h> #define STOP_AT 522233 int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { int a = 0, i = n, m; while (a < i) { a = a*8 + i%8; i /= 8; } if (!((a == i) || (a/8 == i))) continue; m = 2; while (0 != n%m) { ++m; } if (m == n) printf("%on", n); } return 0; }

Cleaning the code

Now that’s clean, isn’t it? Of course not! For instance, nothing in it has a nice, explanatory name. It’s just a messy mix of one-letter variables, magical expressions and a spaghetty of loops.

Notice that m is only used in the second half of the outer loop starting from the assignment and ending at the conditional of the if statement. If those lines would be in a separate function, m could be a local variable and n would be a parameter. Similar can be done for the first part starting from the declaration of a and i and ending at the first if statement. Note that the two loops have no variables in common, but both can be parametrized by n. Let’s move those loops into functions in the hope they will be easier to understand without their disturbing contexts:

#include <stdio.h> #define STOP_AT 522233 static int magic_algorithm(int number); static int other_magic_algorithm(int number); static void print_octal(int number); int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { int a = 0, i = n, m; while (a < i) { a = a*8 + i%8; i /= 8; } if (!((a == i) || (a/8 == i)))  if (!magic_algorithm(n)) continue; m = 2; while (0 != n%m) { ++m; } if (m == n) printf("%on", n);  if (other_magic_algorithm(n))  print_octal(n); } return 0; } static int magic_algorithm(int number) {  int a = 0,  i = number;  while (a < i)  {  a = a*8 + i%8;  i /= 8;  }  return (a == i) || (a/8 == i); } static int other_magic_algorithm(int number) {  int m;  m = 2;  while (0 != number%m)  {  ++m;  }  return m == number; } static void print_octal(int number) {  printf("%on", number); }

That printf() with the format string are ugly implementation details that shoul be buried inside a small, well named function somewhere at the bottom.

The first loop is called magic_algorithm() and the second is other_magic_algorithm() from now. I’m sure they could be given more rational names, but coming up with better names would require understanding the code, which I don’t want to do without making it obvious at the same time, so let’s continue refactoring.

The easiest thing to do is simplifying main() by joining the two if statements:

for (n = 2; n <= STOP_AT; ++n) { if (!magic_algorithm(n)) continue;  if (other_magic_algorithm(n))  if (magic_algorithm(n) && other_magic_algorithm(n)) print_octal(n); }

Now main() is pretty obvious to understand: it prints numbers that are accepted by both of our magic algorithms in octal format. The question is, what and how do these magic algorithms do?

The second is simplier: by extracting the condition of the while loop into a well named function, it becomes almost obvious:

static int is_divisable(int dividend, int divisor); static int other_magic_algorithm(int number) { int m; m = 2; while (0 != number%m)  while (!is_divisable(number, m)) { ++m; } return m == number; } static int is_divisable(int dividend, int divisor) {  /* FIXME: handle division by zero */  return 0 == dividend % divisor; } 

Note that there’s a reason this function is static: it’s not reusable, becuase it does not handle unexpected input, that’s why a FIXME comment is added. Now let’s focus on refactoring.

After extracting is_divisable(), the role of m becomes clear, so let’s find a good name for it:

static int other_magic_algorithm(int number) { int m;  int smallest_divisor = 2; m = 2; while (!is_divisable(number, m))  while (!is_divisable(number, smallest_divisor)) { ++m;  ++smallest_divisor; } return m == number;  return smallest_divisor == number; }

What do you call a number that equals to its smallest divisor greater than 1?

#include <stdio.h> #define STOP_AT 522233 static int magic_algorithm(int number); static int other_magic_algorithm(int number); static int is_prime(int number); static void print_octal(int number); int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { if (magic_algorithm(n) && other_magic_algorithm(n))  if (magic_algorithm(n) && is_prime(n)) print_octal(n); } return 0; } static int is_divisable(int dividend, int divisor); static int other_magic_algorithm(int number) is_prime(int number) { int smallest_divisor = 2; while (!is_divisable(number, smallest_divisor)) { ++smallest_divisor; } return smallest_divisor == number; } 

Do you see Single Responsibility Principle being violated? I do:

static int is_divisable(int dividend, int divisor); static int find_smallest_divisor_greater_than_one(int number); static int is_prime(int number) {  /* FIXME: handle 0 and 1 */  return find_smallest_divisor_greater_than_one(number) == number; } static int is_divisable(int dividend, int divisor); static int find_smallest_divisor_greater_than_one(int number) {  /* FIXME: handle negative numbers */ int smallest_divisor = 2; while (!is_divisable(number, smallest_divisor)) { ++smallest_divisor; } return smallest_divisor == number;  return smallest_divisor; }

Finding the smallest divisor and deciding if a number is prime are at different abstraction levels. Besides, it was never mentioned in the code that 1 is not considered a divisor.

Don’t worry about function call overhead, the first thing compilers will do when encountering this code will be to make is_divisable() and find_smallest_divisor_greater_than_one() inline. Then why did we extract them? Because this code is being written for humans, not compilers!

Now to the other magic function.

What is 42 % 10? It’s 2 of course. What is X % 10? Yes, it’s the last digit of X in base 10. Now replace 10 with 8, and you’re doing the same in octal base. This may help a lot to understand what to extract from the other magic algorithm and how to name the functions extracted:

static int get_last_octal_digit(int number); static int remove_last_octal_digit(int number); static int magic_algorithm(int number) { int a = 0, i = number;  i = number,  digit; while (a < i) { a = a*8 + i%8; i /= 8;  digit = get_last_octal_digit(i);  a = a*8 + digit;  i = remove_last_octal_digit(i); } return (a == i) || (a/8 == i);  return (a == i) || (remove_last_octal_digit(a) == i); } static int get_last_octal_digit(int number) {  return number % 8; } static int remove_last_octal_digit(int number) {  return number / 8; }

This starts to make sense, so let’s continue with this octal digit theory:

 static int get_last_octal_digit(int number); static int append_octal_digit(int number, int digit); static int remove_last_octal_digit(int number); static int magic_algorithm(int number) { int a = 0, i = number, digit; while (a < i) { digit = get_last_octal_digit(i); a = a*8 + digit;  a = append_octal_digit(a, digit); i = remove_last_octal_digit(i); } return (a == i) || (remove_last_octal_digit(a) == i); } static int get_last_octal_digit(int number) { return number % 8; } static int append_octal_digit(int number, int digit) {  /* FIXME: handle invalid digits */  return number * 8 + digit; } static int remove_last_octal_digit(int number) { return number / 8; }

(Sorry, that’s another FIXME added.)

Now the roles of those variables are getting clearer: in every iteration, a digit is removed from the end of i and is appended to the end of a. This will make a contain the digits of the given number in reversed order, though the algorithm is stopped when reaching the digit in the middle.

static int get_last_octal_digit(int number); static int append_octal_digit(int number, int digit); static int remove_last_octal_digit(int number); static int magic_algorithm(int number) { int a = 0, i = number, digit;  int reversed_digits = 0,  remaining_digits = number; while (a < i)  while (reversed_digits < remaining_digits) { digit = get_last_octal_digit(i); a = append_octal_digit(a, digit); i = remove_last_octal_digit(i);  int digit = get_last_octal_digit(remaining_digits);  reversed_digits = append_octal_digit(reversed_digits, digit);  remaining_digits = remove_last_octal_digit(remaining_digits); } return (a == i) || (remove_last_octal_digit(a) == i);  return (reversed_digits == remaining_digits)  || (remove_last_octal_digit(reversed_digits) == remaining_digits); } 

When the digits of a number in a base remain the same when the order is reversed, then the number is called palindromic in that base. This algorithm is a test for the octal base. Note that the algorithm stops moving the digits when it reaches the one in the middle. That’s why the expression in the return statement is built from two parts: one is for an odd number of digits, the other is for the even case. This is enough to rename magic_algorithm() (oops, I noticed a new FIXME again):

#include <stdio.h> #define STOP_AT 522233 static int magic_algorithm(int number); static int is_palindromic_in_octal_base(int number); static int is_prime(int number); static void print_octal(int number); int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { if (magic_algorithm(n) && is_prime(n))  if (is_palindromic_in_octal_base(n) && is_prime(n)) print_octal(n); } return 0; } static int get_last_octal_digit(int number); static int append_octal_digit(int number, int digit); static int remove_last_octal_digit(int number); static int magic_algorithm(int number) is_palindromic_in_octal_base(int number) {  /* FIXME: handle negative numbers */ int reversed_digits = 0, remaining_digits = number; while (reversed_digits < remaining_digits) { int digit = get_last_octal_digit(remaining_digits); reversed_digits = append_octal_digit(reversed_digits, digit); remaining_digits = remove_last_octal_digit(remaining_digits); } return (reversed_digits == remaining_digits) || (remove_last_octal_digit(reversed_digits) == remaining_digits); } static int get_last_octal_digit(int number) { return number % 8; } static int append_octal_digit(int number, int digit) { /* FIXME: handle invalid digits */ return number * 8 + digit; } static int remove_last_octal_digit(int number) { return number / 8; }

Now the while loop could be extracted from is_palindromic_in_octal_base() (remember SRP?), but that would require:

  • output parameters: ugly
  • iterating through the whole number to calculate reversed order to the end: might be slow, notice that the original author might have done it this way to save half of the steps
  • structs and pointers: overkill
  • macro magic: can be confusing

In short: the code is not clean, but we cannot refactor it better. Let’s admit this failure by adding some explanatory comments to this function:

static int is_palindromic_in_octal_base(int number) { /* FIXME: handle negative numbers */ int reversed_digits = 0, remaining_digits = number;  /* NOTE: The loop will stop when the digit in the middle is reached. */ while (reversed_digits < remaining_digits) { int digit = get_last_octal_digit(remaining_digits); reversed_digits = append_octal_digit(reversed_digits, digit); remaining_digits = remove_last_octal_digit(remaining_digits); }  /* NOTE: For an odd number of digits, reversed_digits contains  an extra digit (coming from the middle). */ return (reversed_digits == remaining_digits) || (remove_last_octal_digit(reversed_digits) == remaining_digits); }

After these steps, I realized the need for a Makefile so I could extract reusable code to new compilation units (in addition, I created some quick and dirty unit tests which you can find at GitHub):

lib/primes.h:

#ifndef _PRIMES_H #define _PRIMES_H int is_prime(int number); #endif 

lib/primes.c:

#include "primes.h" static int find_smallest_divisor_greater_than_one(int number); int is_prime(int number) { /* FIXME: handle 0 and 1 */ return find_smallest_divisor_greater_than_one(number) == number; } static int is_divisable(int dividend, int divisor); static int find_smallest_divisor_greater_than_one(int number) { /* FIXME: handle negative numbers */ int smallest_divisor = 2; while (!is_divisable(number, smallest_divisor)) { ++smallest_divisor; } return smallest_divisor; } static int is_divisable(int dividend, int divisor) { /* FIXME: handle division by zero */ return 0 == dividend % divisor; }

lib/palindromes.h:

#ifndef _PALINDROMES_H #define _PALINDROMES_H int is_palindromic_in_octal_base(int number); #endif 

lib/palindromes.c:

#include "palindromes.h" static int get_last_octal_digit(int number); static int append_octal_digit(int number, int digit); static int remove_last_octal_digit(int number); int is_palindromic_in_octal_base(int number) { /* FIXME: handle negative numbers */ int reversed_digits = 0, remaining_digits = number; /* NOTE: The loop will stop when the digit in the middle is reached. */ while (reversed_digits < remaining_digits) { int digit = get_last_octal_digit(remaining_digits); reversed_digits = append_octal_digit(reversed_digits, digit); remaining_digits = remove_last_octal_digit(remaining_digits); } /* NOTE: For an odd number of digits, reversed_digits contains an extra digit (coming from the middle). */ return (reversed_digits == remaining_digits) || (remove_last_octal_digit(reversed_digits) == remaining_digits); } static int get_last_octal_digit(int number) { return number % 8; } static int append_octal_digit(int number, int digit) { /* FIXME: handle invalid digits */ return number * 8 + digit; } static int remove_last_octal_digit(int number) { return number / 8; } 

Reviewing primes.c, it’s obvious it could perform better, but premature optimization is the root of all evil, so let’s just add a comment for now:

static int find_smallest_divisor_greater_than_one(int number) { /* FIXME: handle negative numbers */  /* OPTIMIZATION FIXME: The smallest divisor is always less than  the square root of the number. */ int smallest_divisor = 2; while (!is_divisable(number, smallest_divisor)) { ++smallest_divisor; } return smallest_divisor; }

The most interesting part of the refactoring process ends here.

After the above, I removed all the FIXME comments one by one. Some of them were bad idea, others got converted to unit tests. All of them were easy to fix by adding some parameter checks to various functions.

Next I generalized palindromes.c to work with other bases than octal, then I created a function named find_octal_palindromic_prime_bigger_than(int number), so I could write this:

void print_first_150_octal_palindromic_primes() { int n = 0, i; for (i = 0; i != 150; ++i) { n = find_octal_palindromic_prime_bigger_than(n); print_octal(n); } }

Of course, main() is pretty understandable now:

int main(int argc, char* argv[]) { print_first_150_octal_palindromic_primes(); return 0; }

The details are not very entertaining, though you can find individual commits at GitHub.

Performance

I changed both the refactored and the original version to print all the numbers 10 times. Here are the results (ran on my eeePC, compiled to 64 bit with GCC using -O3):

Obfuscated version: real 0m15.230s user 0m15.180s sys 0m0.010s Clean version: real 0m15.401s user 0m15.130s sys 0m0.040s

The clean version is slightly slower than the original. Sad. But luckily we have a couple of unit tests so we can change the way calculations are done anytime we want, and we already have a good guess on what can be slow: testing each and every number as divisor in primes.c!

After rewriting is_prime() to test only odd numbers and to stop when reaching the square root of the given number, the refactored version gets nearly 14 times faster than the original! The results for 40 iterations of 150 numbers on the same machine:

Obfuscated version: real 0m59.723s user 0m59.610s sys 0m0.000s Clean version: real 0m4.432s user 0m4.410s sys 0m0.000s

Maybe it could be improved further if it didn’t test numbers whether they are palindromic or not, but it generated palindromic numbers at the first place instead.

How much information did the cleanup give?

I mean literally.

That’s the question I wanted to answer when I began writing this post. Lucky we have some mathematical theory to apply here: entropy, ie. the expected value of information (measured in bits). Though Shannon’s entropy could easily be calculated for these files, I’ll be happy with an approximation. By compressing the complete source code (including unit tests but excluding the textfile containing expected output and performance tests) and looking at compressed sizes, theoretical entropy can be approximated well enough. I will compress the source code from four stages: the original, the structure recovered, the function extracted and the final version:

Original makarios.c (commit 0b966486f8)

main(n,i,a,m){while(i=++n) for(a=0;a<i?a=a*8+i%8,i/=8,m=a==i|a/8==i,1:(n-++m||printf("%on",n))&&n%m;);}

Somewhat readable makarios.c (commit bcf7f4046b)

#include <stdio.h> #define STOP_AT 522233 int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { int a = 0, i = n, m; while (a < i) { a = a*8 + i%8; i /= 8; } if (!((a == i) || (a/8 == i))) continue; m = 2; while (0 != n%m) { ++m; } if (m == n) printf("%on", n); } return 0; }

Function extracted makarios.c (commit 18b94d3210)

#include <stdio.h> #define STOP_AT 522233 static int is_palindromic_in_octal_base(int number); static int is_prime(int number); static void print_octal(int number); int main(int argc, char* argv[]) { int n; for (n = 2; n <= STOP_AT; ++n) { if (is_palindromic_in_octal_base(n) && is_prime(n)) print_octal(n); } return 0; } static int get_last_octal_digit(int number); static int append_octal_digit(int number, int digit); static int remove_last_octal_digit(int number); static int is_palindromic_in_octal_base(int number) { /* FIXME: handle negative numbers */ int reversed_digits = 0, remaining_digits = number; /* NOTE: The loop will stop when the digit in the middle is reached. */ while (reversed_digits < remaining_digits) { int digit = get_last_octal_digit(remaining_digits); reversed_digits = append_octal_digit(reversed_digits, digit); remaining_digits = remove_last_octal_digit(remaining_digits); } /* NOTE: For an odd number of digits, reversed_digits contains an extra digit (coming from the middle). */ return (reversed_digits == remaining_digits) || (remove_last_octal_digit(reversed_digits) == remaining_digits); } static int get_last_octal_digit(int number) { return number % 8; } static int append_octal_digit(int number, int digit) { /* FIXME: handle invalid digits */ return number * 8 + digit; } static int remove_last_octal_digit(int number) { return number / 8; } static int find_smallest_divisor_greater_than_one(int number); static int is_prime(int number) { /* FIXME: handle 0 and 1 */ return find_smallest_divisor_greater_than_one(number) == number; } static int is_divisable(int dividend, int divisor); static int find_smallest_divisor_greater_than_one(int number) { /* FIXME: handle negative numbers */ int smallest_divisor = 2; while (!is_divisable(number, smallest_divisor)) { ++smallest_divisor; } return smallest_divisor; } static int is_divisable(int dividend, int divisor) { /* FIXME: handle division by zero */ return 0 == dividend % divisor; } static void print_octal(int number) { printf("%on", number); }

Modularized version tarball (commit c6c01681e3)

lib/octal_palindromic_primes.c lib/octal_palindromic_primes.h lib/palindromes.c lib/palindromes.h lib/primes.c lib/primes.h makarios.c Makefile tests/test.h tests/test_octal_palindromic_primes.c tests/test_palindromes.c tests/test_primes.c

The amount of information in the source code from each stages:

  • Original: 128 bytes (note that the uncompressed C source is smaller, 105 bytes!)
  • After recovering structure: 265 bytes
  • After extracting functions: 689 bytes
  • After organizing into modules with unit tests: 2872 bytes

To summarize: the “compressed” version of the original code contains more information (the gzip header I guess) than the uncompressed file. That’s how you rule at IOCCC! Adding indentation and using the appropriate control structure for each task doubles the information in the code. Extracting some well named functions, adding some comments and naming variables well introduced 2.6 times more information into the source (6.5 times the original 105 bytes), and organizing everything into standalone units with standalone tests multiplied that amount by another 4.1. (That’s 27.3 times more than the original 105 bytes!)

How much work?

According to commit logs, I made visible progress during the following time intervals:

  • Nov 23 18:50:43 2011 – Nov 23 20:54:56 2011: ~2 hours
  • Nov 23 22:56:29 2011 – Nov 24 01:01:11 2011: ~2 hours (with some breaks)
  • Nov 24 23:04:42 2011 – Nov 25 02:09:22 2011: ~3 hours (with many breaks)
  • Nov 26 21:26:47 2011 – Nov 26 23:21:41 2011: ~2 hours

That’s 9 hours of work which includes writing more detailed commit messages than I usually do, taking notes for this blog post, and committing changes way more frequently than I usually do. Also consider that I’m not an experienced C developer, and to make that even worse, most of the work was done in the night, after getting home from real-life work.

Conclusion

I’m not going to suggest anything, since this was just a funny experiment and what’s more, I don’t even think this coding style matches with the C world. The numbers are interesting though.

As an advice, I can say only one thing: when you’re writing code, please think for a moment how long it will take others to understand it. If your code is not crystal clear just by reading through it (without the need for stopping to think for a millisecond, without scrolling back and forth taking notes etc.), then everytime others look at that code, you’re spending your co-worker’s time (thus your company’s money) on something you could’ve got for a one time prize, or even for free if you’d took that three seconds to extract that loop into a separate function or name that variable something better than a and m