Posts Tagged ‘TDD’

My programming environment

Sunday, January 20, 2013 @ 12:01 PM Author: athos

Back in the old days, machines with less than 32 megabytes of RAM and 6 gigabytes of HDD were more than enough for software development. My first somewhat complex programs (small games, dynamic webpages, etc) have been written on a machine with even less resources than that. So once when I’ve taken a look at my resource usage while working on some PHP code in Eclipse and I’ve seen that the IDE used hundreds of megabytes of RAM and another half gigabyte was allocated by the browser, I began to wonder if it is possible to edit source code with a little less resource footprint but without loosing much of the handy tools an IDE can offer. And it turns out, definitely it is!

Switching to a lightweight desktop environment

Instead of Gnome or Unity, I started to use Fluxbox. I chose a nice theme in the right-click menu and was happy with my new desktop. I can start programs from that menu, or invoke them by typing their names into the small dialog box that pops up when pressing Alt+F2 (and it has tab completion).

I’m used to a couple of keyboard shortcuts for common actions like locking the screen or firing up a terminal so I added the following lines to my ~/.fluxbox/keys file:

Control Alt L :Exec xlock
Control Alt T :Exec /usr/bin/xterm -fa Monaco -fs 12 -fullscreen

As you can see, I switched to XTERM from gnome-terminal, but still wanted to use a nice monospace font.

I also said goodbye to Network Manager. Instead I use a small shell script to connect to wifi networks. I don’t like storing passwords on disk anyways.

Another thing I was missing was the desktop background image. Setting up one in Fluxbox is a little user-unfriendly though. You need to add the following line to your ~/.fluxbox/startup file:

feh --bg-center ~/Pictures/background.jpg &

I noticed that screensaver was no longer started automatically. So I set up my ~/.xscreensaver file using xscreensaver-demo and added the following line to my Fluxbox startup file:

xscreensaver -nosplash &

That’s all it took for me to get a desktop environment that fits my daily needs.

Tweaking the command line environment

Since XTERM does not support tabs, I started to use screen to overcome that. My .screenrc is really simple, it sets up a basic statusline, increases the scrollback buffer size and starts bash in four tabs:

startup_message off
vbell off
autodetach on
defscrollback 5000
hardstatus alwayslastline
hardstatus string '[%c] %w'
screen 0 bash
screen 1 bash
screen 2 bash
screen 3 bash

Speaking of bash, I decided to set up a prompt that tells me everything that might be important:

  • username
  • host name
  • when did the last command finish
  • exit status of the last command
  • laptop battery state if available
  • current working directory
  • current git branch when working in a version controlled directory
  • was the shell started from inside Vim, so I will not invoke vim
    inside it again

I have some requirements about its display as well:

  • stand out from command outputs when scrolling back and forth in the
    terminal window
  • start in a new line, even if the previous command did not output one
  • various fields displayed should be easily spotted, e.g. nice formatting and
    color codes
  • look nice on dark background
  • don’t exceed 90-100 characters so in most environments, stay in one line
    (this implies that stuff that can be long, e.g. current working directory,
    needs to be truncated in a useful way)
  • the next command to be typed in should start close to the beginning of the
    line (e.g. by putting it in a separate line)

An example:

--(  athos@localhost[22:50:02] <0> [D42%] ~/projects/dotfiles @master )--
$
ls /tmp

And the .bashrc snippet to do all that (I know, it’s a bit messy):

LIGHTRED='\[\033[1;31m\]'
LIGHTGREEN='\[\033[1;32m\]'
YELLOW='\[\033[1;33m\]'
LIGHTBLUE='\[\033[1;34m\]'
LIGHTPURPLE='\[\033[1;35m\]'
LIGHTCYAN='\[\033[1;36m\]'
WHITE='\[\033[1;37m\]'
NOCOLOR='\[\033[0m\]'

# Prompt
# ------

# Start with a newline as some commands don't end their output with one
PS1="\n$WHITE--("

# Integration with my .vimrc that exports a variable for shells started from vim
PS1="$PS1 \${PROMPT_EXTRA}"

# username@hostname[hh:mm:ss]
PS1="$PS1 \u@$YELLOW\h$WHITE[\t]"

# Exit code of the latest command: green "<0>" or red ">NON-ZERO<"
PS1="$PS1 \$(__x=\$?; if [[ \$__x -ne 0 ]]; then"
PS1="$PS1 echo -n \"$LIGHTRED>\$__x<\";"
PS1="$PS1 else echo -n \"$LIGHTGREEN<\$__x>\"; fi)"

# Display battery info if available
PS1="$PS1$LIGHTCYAN\$(acpi -b 2>/dev/null |"
PS1="$PS1 sed -r 's/^.*(: ([a-z])[a-z]*, ([0-9]*%)).*\$/ [\\2\\3]/i' |"
PS1="$PS1 grep --color=never -m1 '^ \\\\[.*\\\\]\$')"

# Last max 30 characters of current working directory with a less-than sign
# when truncated
PS1="$PS1 $LIGHTPURPLE"
PS1="$PS1\$(echo '\w' | sed -r 's/^.*.(.{30})/<\1/')"

# When inside a git repo, display the name of the current working branch
PS1="$PS1$LIGHTBLUE\$(git branch 2>/dev/null | grep -m1 --color=never '^[*] '"
PS1="$PS1 | sed 's/^[*] / @/')"

# New command to be entered in a new line
PS1="$PS1 $WHITE)--\n\\\$ $NOCOLOR"

# Adjust the title of the terminal window
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;\$PROMPT_EXTRA \u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

On different machines I have a .bashrc file set up on I use different colors for the hostname field, so I am less likely to mistakenly execute commands on a remote machine that I intend to run on localhost.

PROMPT_EXTRA is an environment variable that is set by my .vimrc for example, when invoking bash from a hotkey.

The editor: Vim

Switching to Vim from Eclipse took some time for me. Understanding various modes of the editor, the commands for movement and actions in those modes, the commandline, regular expression search and replace tricks, indentation, marks, tabs, windows, well, there is plenty of stuff to learn! The command vimtutor can help a lot with learning the first few steps of using Vim and a cheatsheet you make for yourself containing all the things you feel the need to learn is a must-have as well. For a couple of months I had my cheatsheet written on my desktop background image so it was always with me when I needed it, both at work, at home or even on the train. It looked something like this:

Move: hjkl b w e $                  Go to matching bracket: [( [{ ]) ]}
Modes: <esc> i o a v                Undo, redo: u, <ctrl>+r
Mark, go to mark: mX 'X `X          Autocomplete: <ctrl>+p <ctrl>+n
Copy, cut: yMOVE yy, dMOVE dd       Paste: p P
Go to line 100: 100G :100           Cut and switch to INSERT mode: cMOVE

Search, next, prev: :/regexp, n, N  Search word under cursor: *
Replace between lines 40-42:        :40,42s/search/replace/g
Replace in entire file:             :%s/search/replace/g

Indent: <MOVE >MOVE                 Indent next 7 lines: 7>> 7<<
Reindent: =MOVE                     Reindent next 10 lines: 10==
Replace string constant: ci" ci'    Rewrap next 5 lines of text: 5gqq

Open new tab: :tabedit file.txt     Move between tabs: gt <number>gt
New window: :vsplit|:hsplit file    Switch window: <ctrl>+w hjkl
Move window: <ctrl>+w HJKL          Maximize window: <ctrl>+w _
All windows equal: <ctrl>+w =       Change window size: <ctrl>+w -|+

Exec: :!<shell cmd>                 Exec on all lines: :%!<shell cmd>
Hex-editor: :%!xxd :%!xxd -r

Useful commands:

:edit             :file           :w                  :q :qall :q! :wq
:tabedit          :Explore        :syntax on          :set expandtab
:set number       :set tabstop=4  :set shiftwidth=4   :set autoindent
:set smartindent  :set paste      :set nopaste        :set formatoptions+=r
:set ignorecase   :set smartcase  :set incsearch      :set hlsearch
:nohl

Once I was familiar with the basics, I began to steal ideas from colleagues, and to customize Vim using the ~/.vimrc file. There are a lot of small features that a programmer is likely to turn on in Vim that are not enabled by default (this is an excerpt from my .vimrc, lines beginning with quote are interpreted as comments):

" Turn on line numbers
set number

" Tabs are displayed as 4 characters wide
set tabstop=4

" Indentation is 4 spaces
set shiftwidth=4

" Insert 4 spaces instead of tabs
set expandtab

" Automatic indentation
set autoindent
set smartindent

" Search while typing pattern
set incsearch

" Highlight search pattern matches
set hlsearch

" Automatically format doxygen style comments
set comments=sl:/**,mb:\ *,elx:\ */

" Insert comment leader when hitting enter in insert mode
set formatoptions+=r

" Insert comment leader when hitting o in normal mode
set formatoptions+=o

" Font name for gVim
set guifont=Monaco

" Draw a margin in the 81th column
set colorcolumn=81

" Show current line and column numbers
set ruler

" Command line is 2 lines, so it's easier to type complex commands
set cmdheight=2

" Highlight syntax
syntax on

" Some nice colorscheme
colors evening

" Make constants readable on projector as well
highlight Constant ctermbg=black ctermfg=green

" Always highlight tabs and trailing spaces.
set list
set listchars=tab:>\ ,trail:.,nbsp:.

" Always assume Unix-style line endings
set fileformats=unix

With those settings, Vim can be turned into a very programmer-friendly, basic code editor. To make it even more convenient, you can define your own hotkeys and command aliases:

" Make frequent typos work.
command Q :q
command W :w
command Wq :wq
command WQ :wq
command Wqall :wqall
command WQall :wqall
command WQAll :wqall

" Pressing F5 will invoke a bash shell and return to the window right after.
nnoremap <F5> :! clear; PROMPT_EXTRA='[VIM]' bash<CR><CR>

" Pressing F8 will turn off search highlighting
nnoremap <F8> :nohl<CR>

" Ctrl+o will open explorer in a vertical split window
nnoremap <C-o> :Texplore<CR>

" Set default view mode for explorer to tree
let g:netrw_liststyle=3

" Make explorer hide Vim's swap files
let g:netrw_list_hide='.*\.swp$'

" Pressing 't' will jump to the next search pattern match and bring it to
" the center
nnoremap t nzz
" Pressing 'T' will jump to the last search pattern match and bring it to
" the center
nnoremap T Nzz

Git hotkeys in Vim

I created a shell script and a few hotkeys to make it easy to access information from Git that I’m likely to need when looking at some code in the editor:

" Pressing F2 will show the complete git history of the file
nnoremap <F2> :!clear; vim_git.sh log "%"<CR><CR>

" Pressing F3 will search the git repository for the word under the cursor.
nnoremap <F3> :!clear; vim_git.sh grep "%" "<cword>"<CR>

" Pressing F4 will attempt to git-blame the line under the cursor.
command GitBlame :echo system('vim_git.sh blame '''.expand('%').''' '.line('.'))
nnoremap <F4> :GitBlame<CR>

Accessing manuals and API docs from Vim

By pressing K in normal mode, Vim will attempt to lookup the word under the cursor in man pages, which is nice for C developers. Since I work a lot with PHP and some other languages as well, I’ve created a small script that can extend this functionality. This script can search in an offline HTML copy of PHP’s manual in case of PHP, or can hunt through GLib’s HTML docs or the man pages in case of C/C++, or can use Google to find info about the given identifier in the context of the given programming language. (Needs
Links or similar terminal based browser to display stuff from the web or offline HTML pages.)

I did not put much effort into it, but it’s very easy to extend if you plan to use it. :-) To invoke that script when pressing K instead of the built-in behavior, I added the following to .vimrc:

" Pressing K will invoke vim_lookup_docs.sh in the PATH with the current
" filename and the word under the cursor. That script may attempt to open
" relevant docs according to the given arguments. Mine guesses file type
" by it's extension and either looks up the given word in various offline
" documentation or uses Google depending on file type.
nnoremap K :!vim_lookup_docs.sh "%" "<cword>"<CR><CR>

Autocomplete and code navigation: Vim + Ctags

By default, Vim offers words from all its opened buffers when pressing Ctrl+p or Ctrl+n. This is certainly more than nothing, but with additional tools, Vim can do much better. Ctags is a tool that can generate an index for entire source code directories based on the lexical elements found in the code. Many text editors, including Vim, can use such an index (tag file) to offer autocomplete suggestions or navigate throughout the code. I use Exuberant Ctags which supports a big variety of programming languages.

I like to keep my tag file near the source code I work with, but I do not want to version control it, so I use a name that most of the .gitignore files I come across will match: .tags.pyc. I know, it’s an ugly hack, but it works most of the time. :-)

Of course, I like to keep my tag file up-to-date, but regenerating it every time I save a file in Vim would take too much resources, so I use a wrapper script around ctags that can decide when to re-generate and when and how to update the tag file based on simple heuristics.

Ctags integration in my .vimrc:

" Use .tags.pyc as a tagfile to offer suggestions when pressing Ctrl+P. This
" filename will most likely be ignored by any .gitignore files without any
" additional work. Requires vim_ctags.sh to be on PATH.
set tags=./.tags.pyc;/
command Ctags silent! !vim_ctags.sh ".tags.pyc" "%" &
autocmd BufWritePost *.c,*.cpp,*.cc,*.h,*.hpp,*.hh,*.php* Ctags
autocmd BufWritePost *.js,*.java,*.py,*.pl,*.rb,*.cs,*.sh Ctags
Ctags

When the cursor is on an identifier defined in another file in the source code, Vim can open up its definition by pressing Ctrl+]. To get back to the code using the identifier, Ctrl+t is the shortcut to use. Both these shortcuts work using the same tab or window, which I don’t really like, so I had to override these shortcuts in .vimrc to work with new tabs instead:

" Map usual tag shortcuts to use tabs
map <C-]> :tab split<CR>:exec("tag ".expand("<cword>"))<CR>
map <C-t> :tabprevious<CR>

Green bar, red bar: running unit tests from Vim

Doing TDD involves running tests in every few minutes, sometimes even seconds, so a code editor should make the basic steps of it as instant as possible. Also, the feedback from those tests should be quick and simple, yet informative. Most big IDEs can run tests and display a green or red bar depending on the result of the tests. In case of tests passing, they remain silent, but on failure, they display info about the mismatch between test expectations and actual behavior. I wanted this in Vim as well! I created a shell script again that can guess how to run tests in the directory it is invoked in. It runs tests, displays and saves their output for later use, and returns an exit code according to the result. A simple script in Vim can invoke this shell script and color the statusbar based on the result, and in case of failure, it can open the output of tests in a split window.

Let .vimrc speak:

" Pressing F9 will attempt to invoke unit tests through a shell script named
" vim_run_tests.sh and open the output in a new window in case of failure.
function RunTests()
    :!clear; vim_run_tests.sh /tmp/__test_output
    :if shell_error!=0
        :14split + /tmp/__test_output
        :hi StatusLine ctermfg=darkred ctermbg=white guibg=darkred guifg=white
    :else
        :hi StatusLine ctermfg=darkgreen ctermbg=black guibg=darkgreen guifg=black
    :endif
endfunction
nnoremap <F9> :exec RunTests() <CR><CR>

vim_run_tests.sh looks for either a Makefile or a build.xml, and in case of the former, it uses my other script called bake to do the build out-of VCS tree.

Demo

Here’s a short screencast demonstrating some code editing in my CLI environment, recorded on an eeePC two years old (however, gtk-recordMyDesktop is slow as hell):

Summary

There are tons of plugins for Vim on the Internet I could choose from, and I’m sure they are so much better polished and offer way more features than my hacks within these scripts, but for me, these scripts and configurations have one key advantage compared to having a bunch of make-a-big-ass-IDE-out-of-Vim plugins: I know exactly what they do and when they do it, and they do their work exactly when I tell them to. Besides, scripting and configuring the hell out of your tools is certainly a fun thing to do, and I’d recommend anybody to explore and customize the tools they use to fit their needs and make the work more productive.

Source code for all of these is available at GitHub.

Oh, and don’t forget to check out Algernon’s emacs environment for ideas!

Tests are dangerous

Saturday, October 27, 2012 @ 05:10 PM Author: athos

Do you write tests because you are expected or told to do so, or do you write them to help all those (including yourself) who will sooner or later work with your code? Tests are code as well, so they should be written an maintained with the same amount of care as the production code. Here I share reproductions of some badly written tests I happened to come across here and there. I’m not going to show the original ones since the aim of this post is not to blame the authors of them but to show how easy it is for tests to become harmful when insufficient amount of thought are put into them.

TDD vs. PHP gotcha’s: TDD wins!

How many problems do you see in the following test?

<?php

class ThreeLetterWordCollectorTest extends PHPUnit_Framework_TestCase
{
    public function testThreeLetterWordCollector()
    {
        $cut = new ThreeLetterWordCollector();
        $actual = $cut->collectThreeLetterWords("bar hello foo world");
        $expected = array("foo", "bar");
        $this->assertSame(sort($expected), sort($actual));
    }
}

It’s just a minor issue that the name testThreeLetterWordCollector does not provide any new information about what behavior it is testing. It’s just the words of the tester class in different order. There are probably better names for a variable than $cut as well. (Don’t make me think! It’s an abbreviation of the term “Class Under Test”.) But the real problem here is the dangerous one: this test does NOTHING! The intent here is likely to be to compare the arrays ignoring the order, but PHP’s sort() function does not work the way one would assume knowing most other languages. Actually, it is one of PHP’s worst library functions. Have you read PHP: a fractal of bad design? It’s a must-read. sort() does the work in-place and returns a boolean depending on the success of sorting. (And to make things worse, it compares elements in a crazy undeterministic type-juggling way by default. I’ve written about it a couple of months ago.) So in the end, given that ThreeLetterWordCollector::collectThreeLetterWords() returns an array, the assertion will always pass, regardless of the contents of the array.

If that test and the production code it validates had been written with TDD, then they would have been attempted to be seen failing and once the test passed when it should have failed, then the problem would have been noticed immediately. Don’t trust tests you have never seen fail!

Comparing strings the Unix-y way

Another test written in PHP, but this one actually could have been written equally badly in any popular language:

public function test1()
{
    $expected = __DIR__ . "/fixtures/expected.conf";
    $template = __DIR__ . "/fixtures/template.tpl";
    $actual = __DIR__ . "/fixtures/actual.conf";

    $generator = new ConfigFileGenerator();
    $generator->generateFromTemplate($template, $actual);

    $result = 1;
    system("cmp $expected $actual 2>/dev/null", $result);

    $this->assertTrue($result == 0);
}

Let’s ignore the fact that the naming of this test is even worse than in the first example. But the way it compares the expected and the actual output is gotta be a joke. A surprisingly bad one. I didn’t even know that this cmp command existed! Oh how I miss you, good old Assembly! Now seriously, what does this test print when it fails?

There was 1 failure:

1) ConfigFileGeneratorTest::test1
Failed asserting that false is true.

Thank you, dear test, now I know what requirement is broken and what is the exact problem with it. If this test was written with keeping not just the happy case in mind but the possibility of failing as well, then it would at least read those two files and compare their contents as strings using the assertions provided by the test framework, producing usable output when going red. Maybe all that would be done by a method named assertFileContents(). Or bringing it to the next level: taking SRP into consideration, the config file generator logic could be decoupled from reading and writing files on the disk, so the latter could be easily mocked in the test harness. (It would run faster as well.) And no fixtures in various files would be required, the test could be fully self-contained!

Obfuscating tests using mutable test data

You can’t say I’m picking on tests written in PHP, here’s a Python one:

class FooFinderTest(object):
    def test_foo_finder(self):
        expected_foo_ids = [ 1, 2, 3, 4 ]
        foo_finder = FooFinder()
        self.assert_foo_ids(expected_foo_ids, foo_finder.find(bar=True))
        self.assert_foo_ids(expected_foo_ids, foo_finder.find(bar=False))

    def assert_foo_ids(self, expected_foo_ids, actual_foo_ids):
        if len(expected_foo_ids) != len(actual_foo_ids):
            print "Unexpected number of foo ids: %s" % str(actual_foo_ids)
            sys.exit(1)
        while len(expected_foo_ids) > 0:
            if expected_foo_ids[0] not in actual_foo_ids:
                print "Expected foo id %d to be found" % expected_foo_ids[0]
                sys.exit(1)
            del expected_foo_ids[0]

The question is: what do we expect in the bar=False case? Those knowing Python well can tell the correct answer: this test expects an empty list when bar is False. Those knowing Python well can also tell that whoever writes a test like this, does not know Python, since it has a built-in test framework so no in-house invented wheels are necessary. E.g. the unittest module and its assertEqual() method could have been used instead of this make-shift assertion that removes all the elements from the list which is passed by reference so it becomes empty before assert_foo_ids() is called for the second time.

Tests are first class citizens too!

In theory, tests encourage and help you to refactor. But a test like the next one does the opposite of that. (You know the Game of Life kata, don’t you?)

class GameOfLifeTest(unittest.TestCase):
    def test_any_live_cell_with_less_than_two_neighbors_dies(self):
        world = World()
        world.put_cell(0, 0)
        world.put_cell(1, 0)
        game = GameOfLife(world)
        game.evolve()
        self.assertFalse(game.get_world().is_alive(0, 0))
        self.assertFalse(game.get_world().is_alive(1, 0))

    def test_any_live_cell_with_two_or_three_neighbors_survives(self):
        world = World()
        world.put_cell(1, 1)
        world.put_cell(2, 1)
        world.put_cell(0, 2)
        world.put_cell(1, 2)
        game = GameOfLife(world)
        game.evolve()
        self.assertTrue(game.get_world().is_alive(0, 2))
        self.assertTrue(game.get_world().is_alive(1, 2))

    def test_any_live_cell_with_more_than_three_neighbors_dies(self):
        world = World()
        world.put_cell(0, 1)
        world.put_cell(1, 1)
        world.put_cell(2, 1)
        world.put_cell(0, 2)
        world.put_cell(1, 2)
        world.put_cell(2, 2)
        game = GameOfLife(world)
        game.evolve()
        self.assertFalse(game.get_world().is_alive(1, 1))
        self.assertFalse(game.get_world().is_alive(1, 2))

    def test_any_dead_cell_with_three_neighbors_becomes_alive(self):
        world = World()
        world.put_cell(1, 1)
        world.put_cell(2, 1)
        world.put_cell(0, 2)
        game = GameOfLife(world)
        game.evolve()
        self.assertTrue(game.get_world().is_alive(1, 2))

What would happen if we needed a Cell object instead of pairs of coordinates? How can a new coordinate system be introduced in this code? How many places do we need to touch the test code just to change the signature of GameOfLife‘s constructor? Imagine any feature request for the game along with the necessary refactorings needed to add the feature cleanly and you will see that the test above makes most of them harder to implement since it needs a lot of changes in several places for basically any change that might occur in the interface of the production code. This test is too coupled to the interface of the production code and its collaborators, in other words, it defines classes and methods multiple times but the behavior it aims to validate is lost inside the noise.

Conclusion

Be careful when writing your tests. Having tests is certainly better than not having them, but to make them effective and helpful, one should never forget the goals the tests are written to reach. Tests are code, so standards, metrics, design principles and readability should be applied on them as well.

CITCON Europe 2012: TDD is like sex

Tuesday, October 23, 2012 @ 11:10 PM Author: athos

Continuous Integration and Testing Conference 2012 Europe took place in Budapest on the last weekend. Unlike other conferences, where a few people are selected to share their ideas and experiences with the crowd, it has an interesting format which I have never seen before, called OpenSpace. The event started in the evening of day #1 with a brief introduction on the topic, the conference, the format, the sponsors and such, then the participants introduced themselves to each other by answering three simple questions:

  • Who are you?
  • Where have you heard about the event?
  • Why are you here?

Once we got a clue about each other, an empty agenda was shown hanging on the wall: participants were asked to fill it by proposing topics for discussions. Titles were written on post-its and were attached to the schedule somewhere while being introduced to the crowd. When the schedule got (over)filled, everybody grabbed a marker to vote on the topics and at the same time, the refactoring kicked off: similar discussions and titles got joined, topics were moved around between time slots and rooms based on the number of votes. You’d think that this lead to unusable chaos, but in fact, after a couple of minutes, the schedule evolved to a state where it started to stabilize.

However, anybody could change anything on the agenda at any time, talks could be switched even right before they’d have started, so it was recommended to check the schedule frequently: both for knowing the whats, wheres and whens, and for preventing conflicts between discussions you were planning to attend. For the rest of the event the most important rule was the Law of two feet which tells you to walk away from any discussion to another right when you realize that the topic is not exactly what you have expected. It’s not rude, it’s the expected behavior in that case! The evening ended with free chat in the hall with tasty snacks and some drinks.

For me, the second day started with a discussion on TDD at 10:00 AM. It was interesting to see how others use its principles and hear their stories about doing or not doing unit testing. We had developers who never write unit tests, some after-the-fact unit test writers, some TDD-ers, even BDD-ers as well. Since one of the initial questions was about finding personal motivation for learning and practising TDD, I added my five cents by recommending everyone to join or facilitate a Code Retreat near their locality, for example on 8th December, the Global Day of Code Retreat. We discussed a lot of aspects about unit testing, but the most valuable for me was the part about introducing TDD to fellow developers: most people won’t adopt methodologies like that until they realize they have a problem and then they see how it can be solved by the new methodology. And by the way: TDD does not automatically enforce better design, it’s experienced designer doing TDD who can do great designs! In the end, we agreed that TDD is like sex: once you have tried it, you never look back!

The second session I attended was about the after-life of builds: how do you get people to happily fix the builds they have just broken, what about deployment, how can you automatize it, how to upgrade or rollback data, etc. Right now I’m thinking about making our CI dashboard more visible, so for example, anyone entering our room would see our latest build statuses so we would be more motivated to keep it green and fix immediately. And by the way: do the impossible fifty times a day!

Next I took part in a discussion about making tests effective: we studied how and why the test pyramid often turns upside down, how the tests near the top get more and more slow and unstable, and how to fix them. These are so called second generation problems, since they imply that, at least, there is a bunch of automated tests for the code. There’s nothing wrong about running series of end-to-end or through-the-UI tests during whole nights frequently, but you need to care about the lower levels of your testing pyramid: the much faster unit tests (close to or above a thousand tests per second!) and integration tests. Have you heard about the Cup of Coffee Metric?

Oh, and it’s recommended to run a set of quick checks (learning tests if you like) before starting those whole-night testruns, just to see if the environment works as expected. Automated tests are code, so they should be treated like code as well: for example, decoupling from test frameworks can be a good idea. How old is your Selenium? Are you afraid of the cost of an upgrade? Are your tests cluttered with XPath expressions all over the place?

The fourth discussion I participated in was about knowledge transfer, especially regarding documentation. I like to take the “good code documents itself” advise literally, so I try to write the least possible amount of documentation. In certain cases, API docs is mandatory at least for public methods, so when writing those, I try to avoid quoting Captain Obvious. (You know, that guy is the one who writes a brief description for a method named writeStdOut() saying “write standard output”.) I name my tests so that even a non-techie person can understand the requirements I implemented from the verbose output of the test framework or from the list of tests run by the CI. But during this discussion, it turned out for me that in-code and runtime generated docs are not easily searchable for non-techies, and I cannot expect them for example to know that PHPUnit prints nice requirement list based on test method names when invoked with the --testdox commandline argument. I learned a great idea as well: in certain cases it’s a good idea to generate visualizations from tests code, for example, in case of a state transition diagram.

In the last time slot I visited the talk about risk analysis and risk management. We got ideas on thinking through the entire system to see what can fail, what parts of the system will a failure effect and how do we get to know about a failure. 5 whys vs. 5 so-whats.

The closing talk turned out to be a discussion as well: after some participants among those who filled out the feedback form won ebooks, the organizers, @Jtf and @PaulJulius asked everyone to share their biggest A’HA! moments. Man, I had plenty of such moments, especially during the TDD, the Effective testing and the Knowledge transfer sessions!

All in all, it was the most interesting, and perhaps the most valuable conference I’ve ever attended. Now it will take weeks or even months to process all the information, links and books that I learned about from the great variety of engineers I met there from so many different cultures, having so many different mindsets. Thank you CITCON, I hope to attend next year as well!

P.S: it’s pronounced kit-kon! :-)

Code Retreats at BalaBit

Tuesday, October 9, 2012 @ 11:10 PM Author: athos

Last week of September was about Code Retreat events for developers and test engineers at BalaBit. During the week we had internal Code Retreat days facilitated by Nucc and me to let everyone learn and practice the basics of Test Driven Development and handling legacy code, without the pressure of everyday work deadlines. On Saturday, September 29, BalaBit hosted the 4th event of Code Retreat Budapest (or 5th if you count last year’s Global Day of Code Retreat), with DeVill and me as facilitators. What surprises me at every Code Retreat I happen to participate in (in other words: all of them in Budapest (-:) is that you can always, always learn something new, no matter how many CR sessions and events you’ve attended.

To let participants focus on the techniuqes themselves, the job at a Code Retreat is always to implement the business logic of Conway’s Game of Life. There are 45 minutes long sessions during the day in which participants have to work in pairs, followed by short retrospective meetings and selection of a new pair for the next session. And of course, the code is deleted at the end of every session so the new one can be a green field project again. The problem usually becomes well-known by the second or third session, that’s when intensive learning can begin by introducing new constraints and concepts in every session.

Free hand session

The first session is typically free hand. The goal is to get familiar with the problem. Most of the time this session makes two kinds of developers emerge: UML-makers and test-drivers. The first kind typically spends a lot of time on designing a Death Star that is able to simulate Game of Life as well, and end up having coded nothing more than a simple data structure or a factory. Test-drivers usually write a few, sometimes a little bit complicated tests first, then write a few methods in the production classes, then debug for some time, then end up with a completely blown-up code base when the test for the next requirement to be implemented breaks everything that seemed to work until then. Most people don’t get to implement a single rule of the game, which is quite interesting, since the game does not seem to be that complicated at first glance.

Baby steps and TDD

The promise that TDD advocators often tell is that if it’s done right then debugging times can dramatically shrink. Since you are always a couple of minutes away from a green bar, it’s not a big deal to revert all your changes to that point in case you got into trouble. However, if your steps become too large, you may find yourself tens or even hundreds of code lines away from the last working state, so when you encounter an unexpected failure, you either throw away serious amounts of work, or you fire up your debugger or begin to add print statements to find out what’s happening and how it should be fixed. This is sad because you’re paid to write working code, not to throw some hack together then debug it to a point when it seems to be working, at least on your machine. That’s why baby steps are often emphasized at Code Retreat sessions: learning to take small incremental steps both in test, in production code and in the overall design can help getting that promise about less debugging and ability to throw away troublesome code and to explore other directions to move forward become true.

What is often hard to get the idea of is faking implementation. Why would anyone write return 0 just to pass a test, knowing that it will be overwritten in less than a minute? Aside from practicing the thought process, it’s similar to stubbing database layers in test harnesses: there’s something that’s out of your test’s scope. Real database setup and communication can make tests run slow, and anyways, when you’re testing your business logic, you don’t really want to test your 3rd party ORM tools and database adapters at the same time, otherwise your test failures won’t be able to tell which layer of your application may be broken. Fake implementations are somewhat similar to that idea: you use stub implementations (temporarily) in parts of the code you are developing that are out of the scope of tests written so far or don’t really belong to the abstraction you are working on.

For example, in case of Game of Life, it’s a good idea to write a test for the first rule first: this rule states that cells with less than two living neighbors must be dead in the next generation. The test will place a lonely cell and one with only one neighbor on the map, then produce the second generation and assert that all three cells are dead in that generation. Returning an empty array in the production code can make all those assertions pass. Since such a simple code can pass this test, it must be either wrong, or it is not really testing death of cells, at least at this point. The latter idea seems to make more sense, and if you think about it, the short-term value of this first test is a demonstration of the interface of the class under development. Looking at this first test, you may decide to make the interface more simple, rename some methods or experiment with other metaphors, etc. From this point of view, any more production code than what is enough to demonstrate a return value of a method may be considered out of scope.

The second rule states that any living cell that has two or three neighbors must live on to the next generation. The test for it may set up a configuration in which all the cells have two or three neighbors and assert that all of them survive. You see it fail, then write a loop that puts cells from the previous generation to the next if they have 2 or 3 neighbors, but how do you know the number of neighbors? Now you could write another loop to iterate over all the neighbors of a cell and increment a counter inside an if statement that verifies if a neighbor is alive, but that together means at least two loops and two if statements for just two tests. Game of Life is simple enough to write all that code at once in seconds without errors, but the job is not to write a working Game of Life but to explore techniques and mindset often associated with TDD, so let’s find an even smaller step instead! One might write a method that counts neighbors and make it return a value derived from the total number of cells on the map for example. This can make the second test pass by adding a loop and a conditional implementing the logic needed to make cells survive depending on the number of their neighbors (but not on the neighbors themselves nor on their position nor on anything else), forming one layer of abstraction, and a fake implementation for a different abstraction layer (ie. the neighbors). If you like, that stub for neighbor counting can be viewed as a degenerate neighbor counter that consideres every cell on the map a neighbor to each other. Test for the third rule will encourage you to replace that fake neighbor counter to a real one anyways. Until then, you have the fewest code possible, making it easier to refactor if you feel that things are going to a wrong or unnecessarily complicated direction. The less code you have in your way, the easier is is to spot and restructure the parts you don’t like.

2 parameters, 3 lines

These constraints are about simplistic, concise code. It’s nearly impossible to write code meeting these constraints at the first time, so you are encouraged to get there by refactoring. Doing so you might find that extracting a few small functions and making complicated ones to be compositions of those can enable nice functional programming tools, and also get a glimpse on what and when to extract, since following such rigid constraints blindly can lead to a code that is more obfuscated than clean. Applying the same restrictions on the test code can remove all the duplication in it as well, making the test less sensitive to even big changes in the structure and API of the production code, which is a good idea anyways. If your tests are in the way of refactoring, you loose all the benefits those tests were made to enable in the first place.

Agile sessions

Agile software development promotes the idea of responding to changes quickly and safely. Thoroughly unit tested code makes this possible by letting you refactor without fear before implementing a new requirement so you can add it to the system conforming to the Open/Closed Principle. New features are implemented by adding new code, not by patching and duct taping existing modules, right? Of course such a flexible architecture that enables that is impractical, if not impossible to create, but after an educated refactoring based on the knowledge of new requirements, you can nicely conform to that principle after all. To demonstrate the concept of such evolutionary design and how test-driven code can help during that process, we’ve come up with the idea of CR sessions involving unexpected requirement changes. Such a session is typically a longer one, we usually join the two last sessions for this experiment. During that we wait for participants to get close to a finished implementation of the Game, but before that could happen, we introduce a new requirement, for example, the age of cells, configurable rules, weight of cells inherited birth-time from the three neighbors, sometimes even hexagonal cells. Implementing any of these can be tricky in an architecture that was not designed with these features in mind and is obsessed with language primitives all over the place, but after isolating some behaviors, extracting relevant responsibilities or introducing new layers of abstractions, they are almost trivial to implement.

Legacy code

Unnecessary coupling and the lack of tests can make any code extremely hard to maintain. To learn the basics of getting bad code under test and then refactor it to be able to add features is quite hard to do during everyday work, but at a Code Retreat, people can safely experiment with it. That’s why I created this horrible implementation of the Game for our internal Code Retreat. It has a loop that can easily become an infinite one, it prints on the console, there are copy&paste snippets here and there, and to make things worse, it often employs bad naming and some overcomplicated helper methods. And we wanted new features in this mess that affected both the code responsible for applying the rules of the Game and the printing logic as well!

Most pairs find out quickly how to get rid of the infinite while loop in evolve() and how to isolate printing and the string representation of the Game’s state from the main logic, and after a few trivial changes, they could easily get the code under test. Some pairs even started by safeguarding those initial dependency-breaking refactorings with end-to-end tests so they were covered from the very beginning. In the end, many of the pairs managed to get a code base they could work with and test-drive the new features in, but for example, those who attempted to rewrite the code from scratch, had a hard time doing it and still failed, and to make things worse, they had painful debugging sessions as well. Imagine how rewriting code you don’t understand can fail if rewriting one that implements well known requirements fails so badly!

Conclusion

I can totally accept that some people still don’t like the idea of TDD and the quite unnatural ways of thinking associated with it and with evolutionary design. I just don’t want to end up integrating or maintaining their code. :-)

Stare dad vs unit tests

Thanks

I would like to say thank you to all the people at BalaBit who have a positive attitude towards learning and practicing the craft this way, and of course to all the craftsmen at Budapest who sacrificed a Saturday for practicing their profession! Meet you on the second Global Day of Code Retreat, on December 8th! And of course, greetings to fellow developers in London, who were also honing the craft on September 29th.

Does it really work? The Transformation Priority Premise

Monday, September 10, 2012 @ 12:09 AM Author: athos

Given a simple problem of sorting numbers in an array, how come that one TDD session leads to Bubble sort while the other produces QuickSort?

Robert C. Martin proposed the question in his talk titled The Transformation Priority Premise at the Norwegian Developers Conference in 2011, which was brought to my attention by DeVill. The topic is also discussed in a blogpost. In short, when more and more tests are created during a TDD session, the test code gets more and more specific, which forces the production code to get more and more generic. While the third step of TDD is refactoring (i.e. changing the code without changing its external behavior, which is verified by the tests written so far), the second, the pass step utilizes transformations, in other words: changing the code to make its behavior more generic. Uncle Bob has a small list of such transformations, from which he recommends choosing the highest one that can make the failing test pass and avoiding the items near the bottom of the list if possible. Doing so is believed to yield better code.

        ({} -> nil)         no code at all -> code that employs nil
       (nil -> constant)
  (constant -> constant+)   a simple constant to a more complex constant
  (constant -> scalar)      replacing a constant with a variable or an arg
 (statement -> statements)  adding more unconditional statements
 (statement -> if)          splitting the execution path
    (scalar -> array)
     (array -> container)
 (statement -> recursion)
        (if -> while)
(expression -> function)    replacing an expression with a func or algorithm
 (stateless -> assignment)  replacing the value of a variable.

Using only those transformations during the pass step of TDD will really force you to take baby-steps! :-)

According to Uncle Bob, the difference between the QuickSort and Bubble sort TDD sessions basically originates from one single decision at an early stage of development. For example, assume the following stage of the TDD iteration, in Python:

class SorterTest(unittest.TestCase):
    def test_sorting(self):
        sorter = Sorter()
        self.assertEquals([], sorter.sort([]))
        self.assertEquals([ 1 ], sorter.sort([ 1 ]))
        self.assertEquals([ 1, 2 ], sorter.sort([ 1, 2 ]))
        self.assertEquals([ 1, 2 ], sorter.sort([ 2, 1 ]))

class Sorter(object):
    def sort(self, elements):
        return elements

At this point, the test using [2,1] is failing and we are to modify the production code to make it pass. One possibility is to swap the elements of the input array if they are in the wrong order:

class Sorter(object):
    def sort(self, elements):
        if len(elements) > 1:
            if elements[0] > elements[1]:
                elements[0], elements[1] = elements[1], elements[0]
        return elements

Swapping the elements of the input array will change the outside world as well since in Python, lists are passed by reference. This is obviously the last element from the list: (stateless->assignment). I tried it, and it trivially lead to a bubble sort. Here’s the screencast:

(Watch the video on YouTube)

What surprised me is how easily it turned into quicksort when instead of swapping elements, I satisfied the failing testcase by appending elements to a new array:

class Sorter(object):
    def sort(self, elements):
        if len(elements) < 2:
            return list(elements)

        sorted = []
        if elements[0] < elements[1]:
            sorted.append(elements[0])
            sorted.append(elements[1])
        else:
            sorted.append(elements[1])
            sorted.append(elements[0])
        return sorted

There’s a screencast for that as well:

(Watch the video on YouTube)

Doing these experiments made me think. Does the same trick work for other well known algorithms? Take searching in a sorted list for example. I could not imagine how those baby-steps could lead to binary search, the fastest algorithm I know for this problem. My first experiment ended up with an implementation of linear search, kinda like what I expected:

(Watch the video on YouTube)

However, I realized I’d broken the rules of TDD. TDD says that you must implement the simplest code that can make the currently failing test pass, but what I did for one of the tests near the beginning was not the simplest solution. At this early stage, everything was fine:

class SearchTest(unittest.TestCase):
    def test_when_value_is_found_in_sorted_list_then_its_key_is_returned(self):
        search = Search()
        self.assertEqual(0, search.search([ 1 ], 1))
        self.assertEqual(0, search.search([ 1, 2 ], 1))

    def test_when_value_is_not_in_sorted_list_then_exception_is_raised(self):
        search = Search()
        self.assertRaises(NotFound, search.search, [], 1)

class Search(object):
    def search(self, sorted_array, value):
        if len(sorted_array) > 0:
            if sorted_array[0] == value:
                return 0
        raise NotFound

Then I added a new test, which was still okay:

self.assertEqual(1, search.search([ 1, 2 ], 2))

But then I added two more if statements to the production code:

class Search(object):
    def search(self, sorted_array, value):
        if len(sorted_array) > 0:
            if sorted_array[0] == value:
                return 0
        if len(sorted_array) > 1:
            if sorted_array[1] == value:
                return 1
        raise NotFound

This is not the simplest code that can make the test pass. There is at least one that is more simple:

class Search(object):
    def search(self, sorted_array, value):
        if len(sorted_array) > 0:
            if sorted_array[0] == value:
                return 0
            if sorted_array[1] == value:
                return 1
        raise NotFound

And when the constant 0 is turned into a variable named lower and the 1 is replaced with the last index of the array named upper, then the algorithm magically starts to resemble a binary search! Though implementing binary search this way is not a brainlessly trivial exercise (see the update at the end of the post for a detailed explanation), the algorithm almost fell out from my hands as I added more and more tests.

(Watch the video on YouTube)

If I was surprised when quicksort appeared on my screen, I don’t know what I was when a shitload of if statements turned into binary search right before my very eyes.

P.S: yes, in the binary search demo, middle would have been a better name for the variable than i. :-)

Update: seems that this binary search screencast is not very self-explanatory. Let’s start from here:

class SearchTest(unittest.TestCase):
    def test_when_value_is_found_in_sorted_list_then_its_key_is_returned(self):
        search = Search()
        self.assertEqual(0, search.search([ 1 ], 1))
        self.assertEqual(0, search.search([ 1, 2 ], 1))
        self.assertEqual(1, search.search([ 1, 2 ], 2))

    def test_when_value_is_not_in_sorted_list_then_exception_is_raised(self):
        search = Search()
        self.assertRaises(NotFound, search.search, [], 1)

class Search(object):
    def search(self, sorted_array, value):
        if len(sorted_array) > 0:
            if sorted_array[0] == value:
                return 0
            if sorted_array[1] == value:
                return 1
        raise NotFound

At this point, all the tests pass. I cannot seem to find a useful refactoring here, however, I don’t like those two if statements. I add a test that allows me to make one of them more generic:

self.assertEqual(2, search.search([ 1, 2, 3 ], 3))

This could be solved by inserting one more if statement, but there are transformations higher on the priority list that can also make the test pass: replacing the constant 1 with the last index in the array:

class Search(object):
    def search(self, sorted_array, value):
        if len(sorted_array) > 0:
            if sorted_array[0] == value:
                return 0
            if sorted_array[len(sorted_array) - 1] == value:
                return len(sorted_array) - 1
        raise NotFound

Now let’s refactor: len(sorted_array) appears three times in the code. By changing the greater-than comparison to an equivalent greater-than-or-equal comparison in the condition, all three occurrences can be refactored to the exact same format. To get rid of this redundancy, this expression can be given a name by introducing a variable, however, the value of that variable never changes during an individual execution of the search() method, so what we have done so far can be viewed as a constant->constant+ transformation. Certainly higher on the list than statement->if.

class Search(object):
    def search(self, sorted_array, value):
        upper = len(sorted_array) - 1
        if upper >= 0:
            if sorted_array[0] == value:
                return 0
            if sorted_array[upper] == value:
                return upper
        raise NotFound

Still wearing the Refactoring hat, we can introduce a name for the other constant as well: it’s the lower boundary of the array.

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            if sorted_array[lower] == value:
                return lower
            if sorted_array[upper] == value:
                return upper
        raise NotFound

Notice that at this point, we have the terminating condition for the loop of a binary search algorithm.

The similarity between the two inner if statements still bugs me, but I don’t know how to eliminate at least one of them, so I keep going by adding a new test:

self.assertEqual(1, search.search([ 1, 2, 3 ], 2))

This introduces the concept of an element in the middle of the array. Sadly enough, the highest item on the list I can come up with is another damn if statement:

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            if sorted_array[lower] == value:
                return lower
            if sorted_array[1] == value:
                return 1
            if sorted_array[upper] == value:
                return upper
        raise NotFound

Carefully inserted into the middle. There it does not seem to hurt much, and at the same time, it’s still easy to enforce moving it from there by introducing a test that expects a NotFound exception to be raised when searching in an array of length 1. However, this does not solve the bigger problem with this new if statement. The last test was intended to make the code look at the middle element in the array but what this statement does is looking at the second element, no matter what. In other words, it’s not quite generic enough. I could pretend not to realize this and add a new test that expects the middle element to be found in a 5 long array, but TDD allows you to take bigger steps when you feel confident, so I replaced the constant 1 with the real index of the middle element which happens to be (lower+upper)/2:

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            if sorted_array[lower] == value:
                return lower
            if sorted_array[(lower + upper) / 2] == value:
                return (lower + upper) / 2
            if sorted_array[upper] == value:
                return upper
        raise NotFound

However, since nor upper nor lower are changed in this function, this expression is still a fancy way of saying 1, at least in the case of the last test written sor far. Also, changing that explicit 1 to this more flexible formula allows the if statement in question to be moved higher inside the outer conditional, simply because it will never attempt to acces an array index out of bounds.

Next thing I did in the screencast was to give this expression a name, though, not the best name as I mentioned earlier. :-) Here I’m using the better name, middle instead of the i used in the screencast.

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            middle = (lower + upper) / 2
            if sorted_array[middle] == value:
                return middle
            if sorted_array[lower] == value:
                return lower
            if sorted_array[upper] == value:
                return upper
        raise NotFound

The next testcase was a tricky one:

self.assertEqual(2, search.search([ 1, 2, 3, 4 ], 3))

None of the inner if statements finds that 3 right now, however, after the first one fails to find it, we can force the second one to do it anyways. When the element in the middle is too small, we can simply set lower to 2, which will trigger the second if statement and bang, our test is green again! Of course we could write lower=2 but we know that this code wouldn’t live for a long time, so let’s use middle+1 instead, which seems to be more future-proof. Remember what TDD says about being confident. Also note that at this point, all our variables and expressions are just fancy ways of saying 0 or len(sorted_array), but for a given input, all of them behave like constants. It does not really matter at this point if you say 2, 1 + 1 or middle+1.

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            middle = (lower + upper) / 2
            if sorted_array[middle] == value:
                return middle
            if sorted_array[middle] < value:
                lower = middle + 1
            if sorted_array[lower] == value:
                return lower
            if sorted_array[upper] == value:
                return upper
        raise NotFound

The same idea leads to the fifth inner if statement after a new test:

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        if upper >= lower:
            middle = (lower + upper) / 2
            if sorted_array[middle] == value:
                return middle
            if sorted_array[middle] < value:
                lower = middle + 1
            if sorted_array[middle] > value:
                upper = middle - 1
            if sorted_array[lower] == value:
                return lower
            if sorted_array[upper] == value:
                return upper
        raise NotFound

Note that everything we did so far was either renaming some constants and parameters or choosing one element from the list of transformations!

But what to do now? First, let’s understand the code we did so far. This bunch of if statements can be divided into two parts. The first three inner if statements check one element near the middle of the array. This part of the code immediately returns the index when it finds the value it was looking for or changes either the lower or the upper boundary. What it means is that the array gets split into two parts: one that surely does not contain the value and the other that may do. This is true since the array is sorted. Then the second part, the last two inner if statements check the first and the last element of that latter subarray. What if we copy&paste-ed all those conditionals? There would be ten of them and they would either find the value or split the array again. Everytime we executed those five conditionals one after the other, the subarray that might contain the value we are looking for would become smaller and smaller, thus lower and upper would get closer and closer to each other. Actually, for iteratively splitting the array, repeating the first three if statements are just enough. Thinking through this finally makes it possible to eliminate the last two conditionals and safely change the outer if to while. Oh, how long I was waiting for being able to delete those if statements!

And voila, there is the binary search algorithm:

class Search(object):
    def search(self, sorted_array, value):
        lower, upper = 0, len(sorted_array) - 1
        while upper >= lower:
            middle = (lower + upper) / 2
            if sorted_array[middle] == value:
                return middle
            if sorted_array[middle] < value:
                lower = middle + 1
            if sorted_array[middle] > value:
                upper = middle - 1
        raise NotFound

Why do estimates keep growing?

Saturday, June 30, 2012 @ 07:06 PM Author: athos

Let’s develop software the pre-agile way! Our application will be an Enterprise Food Store Shopping Cart Total Price Calculator, which obviously will be used to figure out how much we are going to pay for some bread and butter for our sandwiches at the supermarket. We are going to develop our software real fast, which means we will not involve big up-front planning and we will not waste our time on unit testing, since for such a trivial application, bothering with the tests instead of the features just slows us down and clutters our code with unnecessary indirections, right?

The language is going to be Python, just to keep things simple. (Sorry PHP. :-P )

The problem

We are going to buy different kinds of food. Every kind has its own price which is specified as Hungarian Forint (HUF) per unit which can be kilograms for some kinds, pieces for others, etc. Of course the actual price for a given item in our shopping cart will be the amount (ie. number of units) bought multiplied by the unit price, and the total price will be the sum of all those prices. That’s just a for loop and elementary math, no surprises expected.

Fast software development

We will need a simple data structure for the kinds of food to store unit prices. We are going to use integers since we have no smaller units of money than HUF in Hungary since decades. Some say value objects are cool, so we are going to use a simple immutable data structure like this:

class Food(object):
    def __init__(self, unit_price):
        self.unit_price = int(unit_price)

    def getUnitPrice(self):
        return self.unit_price

We need to represent individual items in the shopping cart in order to simplify the overall price calculation. Smells like another data structure:

class ShoppingCartItem(object):
    def __init__(self, kind, amount):
        self.amount = float(amount)
        self.kind = kind

    def getPrice(self):
        return self.kind.getUnitPrice() * self.amount

Note that the amount can be an arbitrary real number, for example we might want to buy 0.5 kg of onions. On the contrary, 0.5 piece of bun should not be allowed, but let’s just ignore that for now. We might develop validation of such constraints when users start complaining about them.

Finally, we need to calculate the total price:

class TotalCalculator(object):
    def total(self, shopping_cart_items):
        total = 0
        for item in shopping_cart_items:
            total += item.getPrice()
        return total

We never said we wouldn’t test at all:

def main():
    onion = Food(100)
    cucumber = Food(200)
    tomatoe = Food(300)
    items = [
        ShoppingCartItem(onion, 1),
        ShoppingCartItem(cucumber, 2),
        ShoppingCartItem(tomatoe, 3)
    ]
    total = TotalCalculator().total(items)
    print "Total: ", total, " HUF"

if __name__ == "__main__":
    main()

We expect it to output 100 * 1 + 200 * 2 + 300 * 3 = 1400 HUF, and it works just fine.

Overall development time: a few minutes, maybe less. That’s what I call fast!

We’re ready!

Turns out our Enterprise Food Store Shopping Cart Total Price Calculator becomes so successful we start to get feature requests from a growing number of users after a couple of weeks.

Base 5 rounding

In Hungary, we don’t use some small units of our money when paying in cash because of some economic or political crap I don’t happen to give a duck about. Actually, we don’t have 1 HUF nor 2 HUF coins at all, the smallest value expressable in cash is 5 HUF. So we round everything in base 5. For example:

  • 11 HUF and 12 HUF are rounded downwards to 10 HUF,
  • 13 HUF and 14 HUF are rounded upwards to 15 HUF,
  • 16 HUF and 17 HUF are rounded downwards to 15 HUF,
  • 18 HUF and 19 HUF are rounded upwards to 20 HUF.

Now our users want us to implement these rules, and we are happy to do so, it’s just a few more lines of code, pretty harmless:

class TotalCalculator(object):
    def round_in_base_five(self, price):
        remainder = price % 5
        if remainder <= 2:
            return price - remainder
        return price + 5 - remainder

    def total(self, shopping_cart_items):
        total = 0
        for item in shopping_cart_items:
            total += self.round_in_base_five(item.getPrice())
        return total

Impossible to screw up, but just to be sure, let’s modify our little test to see the new function working. Just create a food kind with a price of 1 HUF/kg, and see how it’s price is rounded when we buy 10 kg, 11 kg, 12 kg of it and so on:

def main():
    onion = Food(1)
    for i in range(10, 21):
        total = TotalCalculator().total([ ShoppingCartItem(onion, i) ])
        print i, " ~ ", total

This code outputs the rules exactly as specified, so we’re ready. Development time: one more minute. What a professionalism here!

Ooops, a bug there!

Oh, well, a couple of days or maybe weeks later a user will complain that if onions cost 121 HUF/kg and we buy just 0.1 kg of them then the application will calculate 15 HUF instead of 10 HUF as the price. The total price should be 10 because 121 * 0.1 = 12.1 which when rounded to integers (remember, there are no cents for HUF) yields 12 which should be rounded downwards in base five. After some time of debugging (maybe comparable to the total cost of the initial development of the rounding feature, though) we find out that we forgot to round the price to integers before doing the base 5 rounding. A simple fix for that:

def round_in_base_five(self, price):
    price_int = round(price)
    remainder = price_int % 5
    if remainder <= 2:
        return price_int - remainder
    return price_int + 5 - remainder

Does it work?

def main():
    onion = Food(121)
    total = TotalCalculator().total([ ShoppingCartItem(onion, 0.1) ])
    print "Total: ", total, " HUF"

That outputs 10 HUF, we’re done.

Taxes

A month later we promise to some of our users that we develop support for taxing rules, since different kinds of food can have different rates of taxes. It seems to be just another property for our Food data structure and another multiplication in getPrice(). We bet we can write that code in seconds!

But during testing we find some interesting behavior: when onion costs 102 HUF/kg and cucumber costs 202 HUF/kg and we buy 1 kg of each, the total price is calculated to be 300 HUF (without taxes), but 102 + 202 = 304 which should be rounded upwards to 305. We revert our taxing related changes but the behavior remains reproducable, so it’s not a regression introduced during the development of taxing. After a short debugging session (just for curiosity) we find that total() applies the base five rounding inside the for loop instead of rounding the total price only:

def total(self, shopping_cart_items):
    total = 0
    for item in shopping_cart_items:
        total += self.round_in_base_five(item.getPrice())
    return total

Commit messages in the version control system don’t tell anything special about that line, the method seems to be unmodified since the introduction of the rounding feature, so we scan through our months old emails to find that PDF containing the feature request and the exact requirements about rounding, in the hope that it will help us remember why it’s inside the loop. Sadly our documentation seems to be lacking that information. The game is two-fold now: either it’s a bug, maybe a misunderstanding of some ambiguity in the specs, or an intended behavior our users requested back then. In short, we have no evidence, we need to talk to a domain expert.

Development time of the taxing feature: oh, so we were developing a feature before we started digging through aeons old docs to find out if a strange behavior is a bug or a feature! Yeah, those took about 20 minutes altogether instead of our original estimations of a few seconds. And by the way: how exactly are we supposed to calculate taxes? Should they be based on the integer rounded price or the real number (real as in Math)? How do they relate to base 5 rounding? (Answers for these questions may also require some rounding related code to be moved to getPrice().)

Retrospective: uhm, well, if we gave up on that former issue right when we found out it’s not a regression, then we might have been able to meet our original estimations. We promise ourselves next time we will just file a new bug in our bug tracker, which we can fix later. Except when such a bug makes the new feature untestable. Remember, we need to move fast, so we have no time to investigate bugs in such detail or even fix them. And to make things worse, such investigations may also question how existing features do and how they should work, and answering those questions will surely require even more time!

So what?

Our code consists of 3 very simple classes (2 of which are stupid data structures), and all it does is evaluating a simple formula using a for loop and a conditional statement. Yet we had 2 bugs in it, one of which seems to produce an endless stream of questions about almost everything we’ve done so far, leading us to distrust our own code we were so proud of once.

On the other hand, if we found a testcase somewhere in our code named test_base_5_rounding_is_applied_to_individual_items() or similar, we would be able to make some assumptions that we could use to make our life easier. For example we’d know that at some point of development it was considered expected behavior somebody thought of while developing the feature. It may have been reviewed by customers as well when we presented them the list of requirements we implemented and are happy to retest anytime a new build is created (e.g. by our CI system through automated, quickly running tests) to make sure we never loose them. We may alter that behavior of course if people’s minds changed since then (starting with renaming the testcase, then changing the expected values in it), but this would also be a well-thought decision. Besides, reading through the names of testcases in the unit tests of TotalCalculator is surely easier than finding that email, PDF, wiki page or whatever documentation which provides an exact, formal and detailed specification. You may even think of your tests as a so exact, so formal and so detailed specification that they can be run on a computer to verify the correctness of the system. Beat that with emails and wiki pages!

Conclusion

We turned a very simple, few eLoC source code into a rotting maintenance hell in minutes of effective work through a short series of seemingly rational, closely reasoned decisions and nice and mostly clear specifications. Now imagine how many times similar mistakes we made throughout this little experiment can be done during the development of huge and complex systems and how the effects can add up! For me, that question about growing estimates (and still missed deadlines) in legacy code base (by definition: code lacking low level automated tests) is answered.

Game of life – never getting boring

Thursday, May 31, 2012 @ 09:05 PM Author: athos

Do you think it’s funny to implement Conway’s Game of Life for the 20th time? According to my experiences, the answer is hell yeah!

Since the Global Day of CodeRetreat event (organized by Marton Meszaros in Hungary), I’m a big fan of the CodeRetreat format of coding dojos. I’ve been attending all the events of the CodeRetreat Budapest Community, and at the last one, I even helped coaching during some of the sessions. Nucc and me decided to introduce this event to our fellow developers at BalaBit as well and fortunately the management was kind enough to support us by dedicating some working hours to learn about TDD and other modern software development methodologies.

At a CodeRetreat event, you always implement the four rules of Game of Life (without UI and such fancy stuff). Doing the same task again and again makes the problem well-known and understood so you can focus on the style and design of your tests and code, which is the goal of a CodeRetreat. You don’t have to finish the implementation, though, it’s not about Getting Things Done or competing which pair is the first to finish. More like Not Getting Things Done. Often it’s impossible as well to finish for you have only 45 minutes to work. After that, well, code is deleted, a short retrospective meeting is held then a new session begins. Pair programming is mandatory, and by changing the pairs after every session you’re guaranteed to learn a lot from others. To make sure that you won’t be able to train yourself during the day to finish new constraints are introduced in every session, like infinite grid, immutability of states, 3 line methods with at most 2 parameters, no-mouse (did you notice you go faster when you use only keyboard shortcuts in your IDE?), etc..

During the first few sessions, there are no constraints. The goal is getting the problem understood. In the implementation, you just do whatever you want. That seems to be the hardest constraint though, since people tend to not finish during the first sessions. Instead they do Big Design Up Front and forget to care about the software to be implemented. These are the sessions where nice UML diagrams about Cells and Neighborhood relations and Worlds and Games and Rules and such nicely designed stuff are drawn, then people realize after 45 minutes that they spent almost an hour working on getters and setters but not a single rule of the game got implemented. Yep, there’s a reason for the Four Rules of Simple Design having been invented. ;-)

Turns out if you’d started the very first testcase with the first rule (“Any live cell with fewer than two live neighbours dies, as if caused by under-population”), you might have had a working implementation of all four of them after 15-20 minutes or so. And nothing more than that since you wouldn’t have written code that is never used except for making those UMLs feel more complete. YAGNI!

Of course there’s a chance that such a quick and dirty implementation will be, well, dirty. It may contain at least one thing you consider ugly. But you have more than half of the session left and the safety net of a bunch of tests to refactor the hell out of it!

You spoiled the public interface of your Game with methods that should belong to a class named Board or World or Map instead? Are you feeling the need to test a method that is declared private? After extracting such methods into a separate class, you might end up with a Game class that has no state at all and contains only one public method that just creates new generations of cells based on a previous one.

On the other hand, after realizing how over-engineering slows them down, many people try to continue with sloppy design, skipping the refactorings for example to get rid of language primitives in the most abstract levels of the business logic or to conform to the SOLID principles, etc. They can be forced to do so by introducing requirement change surprises. For example, to handle hexagon maps along with square grids, a Topology or similar interface has to be extracted from the class representing the board, which can encapsulate the details of coordinate systems and neighborhood relations between coordinates. (Did you notice that the rules of the game do not depend on the placement and shape of the neighbors nor the neighbors themselves, but only their count is what matters?) To handle colored cells which know how old they are, cells have to be represented by a separate class instead of bare coordinate-tuples or boolean flags.

Such refactorings can be a pain in the ass when tests contain copy&paste, duplication, redundancy and duplication, so they must be cleaned up as well: by extracting common fixtures, creating abstract assertions based on the repeating patterns of similar assertions, etc. (Abstract assertions even help you to conform to such rigorous principles like “1 assertion per test.”)

Imagine your day to day software development work boiling down to such simple business requirements implemented by a couple of short but well named methods in a few simple and minimalistic classes, all of them thoroughly unit tested (by unit testing I mean unit testing), all of those tests proven to be useful (since you’ve seen them fail and turning into green a couple of seconds later) and providing 100% coverage of assertions. Wouldn’t it be easier than digging yourself through stack traces in mindblowingly complex codes full of nested infinite-but-conditionally-breaking-somewhere-in-the-middle-do-while loops depending on side effects of getter functions and badly named global variables coming from a dozen layers away? If you know this feeling I’m talking about, I’d recommend you to visit the next CodeRetreat near your location (look around in Twitter or Facebook) – to try something new!

Global Day of CodeRetreat 2011

Monday, December 5, 2011 @ 10:12 PM Author: athos

Saturday Nucc and me were attending the Global Day of Code Retreat 2011, Budapest event (#gdcr11) invented by Corey Haines and brought to Hungary by Marton Meszaros. In short, it was great fun and I hope that little community I met that day will come together frequently.

In more detail, Coderetreat is a coding dojo based on Conway’s Game of Life. In 5 or 6 sessions, each taking 45 minutes, you write code with your pair in order to translate the rules of the game into a working software. It does not have to display the board on a shiny UI, no animation has to be rendered, making all the tests green is just enough. Knowing you must delete all the code at the end of each session and that 45 minutes is rarely enough to finish implementing the game with full test coverage (and additional constraints you are free to pick), you can concentrate on the code quality itself: encapsulation, good naming, SRP, TDD, KISS, functional programming, or whatever skills you want to practice and develop.

After each session, code is deleted and a retrospective meeting is held where everyone can tell others what was learnt and what needs improvement during future sessions. Of course, during the whole day, your social skills get improved as well: pairing with unknown people, working to solve problems together requires both of you to understand and communicate with each other.

GDCR11 was a global event: coders all around the world came together to practice writing code without deadlines, legacy code base, ever changing requirements and such, just to form communities, to meet others, and the most important: to learn.

And by learning, I don’t just mean seeing different languages and testing frameworks in action or experimenting with TDD and CleanCode principles in general, there’s way more than those. For example, by pairing with people I’d just met, I saw how hard it can be to work with me sometimes: I noticed I’m just too bull-headed sometimes to accept others people’s ideas. It was also interesting to see how others interpret rules of TDD, how big are the steps they take while writing tests and code.

It was totally worth it to attend (btw. thanks for the sponsors, Mimox and MyCorporation!), I’m sure I will be there at the next one (note that it doesn’t have to be a global event, a day of Coderetreat can be organized at any time, anywhere), and I recommend it for any developers enthusiastic about developing their professions as well!

Update: Oh, there is a Facebook page for Coderetreat@Budapest!

Test-driving a regular expression

Sunday, October 23, 2011 @ 01:10 AM Author: athos

“When you have a problem to solve, use regular expressions! Now you have two problems!”

Regular expressions are very useful tools in the inventory of a programmer: they can be used to validate a string according to a given format, or to find and extract parts of a string, or even to change substrings matching a given pattern. But when you write some hellish tricky regular expression, do you stop for a moment to think about the spaghetty of while loops and switch-case statements that your regexp will translate to? I mean just a few characters of regexp may specify a state machine that would take tens or maybe hundreds of lines of loops and conditionals to implement. All that amount of code gets executed whenever you use your regexp, so you would need a big amount of testcases to cover that state machine if you happen to practise Test-driven development (TDD). In this experiment I’m going to blindly apply the three well-known steps of TDD to create a regular expression for a given task, and see how far I can get and how much time it takes.

These three steps of the TDD cycle are:

  1. Write a failing unit test (compilation errors are considered a failure).
  2. Write just enough production code and not a single line more to pass the failing test.
  3. Cleanup the mess created during the first two steps.

Or in short:

  1. Red
  2. Green
  3. Refactor

My kata

My kata for this experiment will be validating a simple time specification, that contains two digits of hour and minute parts separated by a colon and either AM or PM in case of 12 hour notation. The rules are going to be:

  • Valid time specifications should always contain a two digit number (with a leading zero if necessary) describing the hour of the day followed by a colon followed by a two digit number (with a leading zero as well) to describe the minute of the hour.
  • The HH:MM part may be followed by a single space followed by two alphabetic characters, either AM or PM. In this case the time specification is considered to be in 12 hour format, thus the hour part should not be more than 12.
  • When there is no AM or PM at the end of the string, the time specification is considered to be 24 hour format, in which case the hour part should be between 00-23 (inclusive).
  • In any case, the minute part must be between 00-59 (inclusive).

The job is to decide if a string matches all the above rules or not.

There are a million possible implementations, but for now, let’s evolve this one by applying the TDD development cycle to meet the requirements. The language is going to be PHP, the testing framework will be PHPUnit, and the implementation will rely on PHP’s preg_match() function. First, let’s start from very small, trivial steps. Going forward, we may speed things up, just like in Kent Beck’s wonderful book.

The first testcase

In TDD, tests come first, so let’s create a simple testcase:

<?php

require_once 'TimeValidator.php';

class TimeValidatorTest extends PHPUnit_Framework_TestCase
{
    public function testTimeValidatorCanBeCreated()
    {
        new TimeValidator();
    }
}

?>

This will fail badly, obviously, but let’s run it, just to be sure!

PHPUnit 3.5.15 by Sebastian Bergmann.

PHP Fatal error:  Class 'TimeValidator' not found in /home/athos/projects/TimeValidator/TimeValidatorTest.php on line 7
PHP Stack trace:
PHP   1. {main}() /usr/bin/phpunit:0
PHP   2. PHPUnit_TextUI_Command::main() /usr/bin/phpunit:49
PHP   3. PHPUnit_TextUI_Command->run() /usr/share/php/PHPUnit/TextUI/Command.php:129
PHP   4. PHPUnit_TextUI_TestRunner->doRun() /usr/share/php/PHPUnit/TextUI/Command.php:188
PHP   5. PHPUnit_Framework_TestSuite->run() /usr/share/php/PHPUnit/TextUI/TestRunner.php:305
PHP   6. PHPUnit_Framework_TestSuite->runTest() /usr/share/php/PHPUnit/Framework/TestSuite.php:733
PHP   7. PHPUnit_Framework_TestCase->run() /usr/share/php/PHPUnit/Framework/TestSuite.php:757
PHP   8. PHPUnit_Framework_TestResult->run() /usr/share/php/PHPUnit/Framework/TestCase.php:576
PHP   9. PHPUnit_Framework_TestCase->runBare() /usr/share/php/PHPUnit/Framework/TestResult.php:666
PHP  10. PHPUnit_Framework_TestCase->runTest() /usr/share/php/PHPUnit/Framework/TestCase.php:628
PHP  11. ReflectionMethod->invokeArgs() /usr/share/php/PHPUnit/Framework/TestCase.php:738
PHP  12. TimeValidatorTest->testTimeValidatorCanBeCreated() /home/athos/projects/TimeValidator/TimeValidatorTest.php:0

Yeah, we are in Red! Now we can write a tiny bit of production code in the hope we will get into the Green phase:

<?php

class TimeValidator
{
}

?>

And now the test passes, we’re in Green.

Is there anything to refactor here? We have 0 ELoC, so let’s move on. I’m going to ignore Tell, don’t ask! for this kata, so I’m going to call the one and only public method of my TimeValidator class isValid(). Maybe later I’ll come up with a cleaner name for both the class and the method, but it will do for now. But before we could write any more lines of production code, we need a failing testcase first. So let’s do some actual work in the testcase:

public function testTimeValidatorCanBeCreated()
{
    $validator = new TimeValidator();
    $this->assertTrue($validator->isValid('10:42'));
}

There’s no such method, so this will fail. Let’s implement the method with an empty body in TimeValidator.php:

public function isValid($time_specification)
{
}

Again, the test fails because NULL does not equal to true. We’re still in Red, so we can write some more production code, because the lines we have written so far are not enough to get us into Green. What is the simplest algorithm that will make our failing testcase pass if implemented in the production code? Yes, it’s a big facepalm:

public function isValid($time_specification)
{
    return true;
}

And now we’re in Green. Let’s do some refactoring!

The name of our testcase lies: it says it checks the construction of our newly defined class, but actually the return value of a function is being checked. So let’s rename the testcase as a refactoring:

public function test24HourFormatIsAccepted()
{
    $validator = new TimeValidator();
    $this->assertTrue($validator->isValid('10:42'));
}

I decided to consider 10:42 to be in 24 hour format, because that is the simplier one.

Is there any more opportunity to refactor? Though our validator is very useless at the moment, we don’t have any tests to enforce us to implement some real functionality in that method, so we’re not allowed to change that code to something more complex. We were required to write just enough production code to pass the failing test, and we’ve done exactly that and not a single line more.

Now what about creating a failing testcase to let us move forward?

A negative test

After a few seconds of thinking, I decided to test if our method does care about colons. Any kinds of time specifications mentioned in the requirements must contain a colon character, so it gives us a fairly simple testcase:

public function testTimeSpecificationMustContainOneColon()
{
    $validator = new TimeValidator();
    $this->assertFalse($validator->isValid('1042'));
}

Sadly Hooray, a failure! We’re back in Red, let’s write some more production code! Sadly a fairly simple change to our isValid() can make this test pass:

public function isValid($time_specification)
{
    return $time_specification != '1042';
}

Back in Green, but far from any meaningful code. At least we can refactor, luckily we have some code duplications:

  • There’s a repeating pattern in our tests: instantiation of the class under test (CUT).
  • The string constant ’1042′ appears in both the test and the code!

In the Refactoring phase, we’re allowed to remove code duplication while keeping the tests pass and the functionality of the code unchanged. Let’s deal with the first problem, PHPUnit offers the setUp() method for that:

class TimeValidatorTest extends PHPUnit_Framework_TestCase
{
    public function setUp()
    {
        $this->validator = new TimeValidator();
    }

    public function test24HourFormatIsAccepted()
    {
        $this->assertTrue($this->validator->isValid('10:42'));
    }

    public function testTimeSpecificationMustContainOneColon()
    {
        $this->assertFalse($this->validator->isValid('1042'));
    }

    private $validator;
}

That was piece of cake, now let’s clean the production code! The simplest PHP line that came to my mind to make both testcases pass without the dirty hack of repeating constants in the production code used by the tests to check the behavior of the code, was an strpos() call. But knowing that in the end I want to see a regular expression doing the work, I used an equivalent regexp matching instead:

class TimeValidator
{
    public function isValid($time_specification)
    {
        return preg_match('/:/', $time_specification) == 1;
    }
}

The tests still pass, so it’s time to look for more difficult cases. (And speed things up a little bit.)

Only one colon

The next test that came to my mind was to see if the regexp can deal with two colons:

public function testTimeSpecificationMustContainOneColon()
{
    $this->assertFalse($this->validator->isValid('1042'));
    $this->assertFalse($this->validator->isValid('10::42'));
}

Red phase again, but it’s not very difficult to get into Green:

public function isValid($time_specification)
{
    return preg_match('/^[^:]*:[^:]*$/', $time_specification) == 1;
}

Now we look for exactly one colon from the beginning of the string to the very end. And we’re in Green! Anything to refactor here? Yes, duplicated non-trivial code appears in the regular expression! Let’s get rid of it:

public function isValid($time_specification)
{
    $not_colon = '[^:]*';
    return preg_match(
        "/^$not_colon:$not_colon\$/",
        $time_specification
    ) == 1;
}

The tests still pass. Now there is still repeated code in the regular expression and to make things worse, it got longer, but this will move us forward, so knowing what to come, it’s a good idea to extract that little piece. Besides, the variable explains pretty well what the expression does. Anything else to refactor? Maybe a dataProvider would help eliminating the two almost identical lines from the tests, but I’m lazy now. I promise the next time I’m gonna write a similar assertion, I’ll do clean up the tests.

Hours must be numeric

The next simpliest test that came to my mind was:

public function testHoursMustBeNumeric()
{
    $this->assertFalse($this->validator->isValid('ab:42'));
}

Now we’re in Red, but it seems the next Refactoring phase will be a nice opportunity to get back to that promise.

How to fix that? I know I want to validate hours, so I’m going to create a new variable to be substituted into the regular expression:

public function isValid($time_specification)
{
    $hours = '[0-9]*';
    $not_colon = '[^:]*';
    return preg_match(
        "/^$hours:$not_colon\$/",
        $time_specification
    ) == 1;
}

The new variable is called hours and it matches any sequence of numbers. Yeah, Green again! That was just a matter of seconds. So anything to refactor here? Yes, remember what I promised a few minutes ago:

/**
 * @dataProvider provideInvalidTimeSpecifications
 */
public function testInvalidTimeSpecificationsAreNotAccepted($time_specification)
{
    $this->assertFalse($this->validator->isValid($time_specification));
}

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Hours must be numeric' => array('ab:42'),
    );
}

Hours must be less than 24

Regardless of 12 hour time format, the hours part must be less than 24. This assertion is simple enough to create a new testcase from it:

public function provideInvalidTimeSpecifications()
{
return array(
    'Time specification must contain at least one colon' => array('1042'),
    'Time specification must contain at most one colon' => array('10::42'),
    'Hours must be numeric' => array('ab:42'),
    'Hours must be less than 24' => array('24:42'),
);
}

And the code that passes all the tests so far is not a big deal either:

public function isValid($time_specification)
{
    $hours = '([0-1]?[0-9]|2[0-3])';
    $not_colon = '[^:]*';
    return preg_match(
        "/^$hours:$not_colon\$/",
        $time_specification
    ) == 1;
}

Anything to refactor? Nope I guess.

Minutes must be numeric

One more simple testcase, just to see how well minutes are validated:

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Hours must be numeric' => array('ab:42'),
        'Hours must be less than 24' => array('24:42'),
        'Minutes must be numeric' => array('10:ab'),
    );
}

Getting from Red to Green is children’s play as well, I did the same as in the case of the hours. Minutes are validated to be numeric, then a new testcase to make sure they are less than 60:

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Hours must be numeric' => array('ab:42'),
        'Hours must be less than 24' => array('24:42'),
        'Minutes must be numeric' => array('10:ab'),
        'Minutes must be less than 60' => array('10:60'),
    );
}

And the code:

public function isValid($time_specification)
{
    $hours = '([0-1]?[0-9]|2[0-3])';
    $minutes = '([0-5]?[0-9])';
    return preg_match(
        "/^$hours:$minutes\$/",
        $time_specification
    ) == 1;
}

Anything to refactor? Not a single bit! Notice how nicely that ugly $not_colon variable disappeard? The regular expression is quite self-explanatory, it doesn’t even need a comment to be understandable.

Two digits

Both hours and minutes should be expected to be two digits:

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Hours must be numeric' => array('ab:42'),
        'Hours must be less than 24' => array('24:42'),
        'Minutes must be numeric' => array('10:ab'),
        'Minutes must be less than 60' => array('10:60'),
        'Hours must be two-digit' => array('1:42'),
        'Minutes must be two-digit' => array('10:4'),
    );
}

Okay, those are two tests at the same time, but TDD allows you to go in your own tempo as long as you can clearly see the small, trivial steps you join together to take a bigger leap. In this case, the two tests are nearly identical, and the fix for them is similar as well: we only have to remove two question marks (why the hell did I write them in the first place?):

public function isValid($time_specification)
{
    $hours = '([0-1][0-9]|2[0-3])';
    $minutes = '([0-5][0-9])';
    return preg_match(
        "/^$hours:$minutes\$/",
        $time_specification
    ) == 1;
}

There’s no doubt that the trivial steps joined together here are visible for everyone, so let’s move on. What to refactor now? I can’t see anything. :-(

The good news is that we are now validating 24 hour format time specifications and our code is still readable, even the reqular expression.

The half of 24 is 12

But the work to validate it is going to be at least the double. Let’s start with some positive tests:

public function test12HourFormatIsAccepted()
{
    $this->assertTrue($this->validator->isValid('10:42 AM'));
}

That’s very similar to test24HourFormatIsAccepted(), but we’re not in Red, so we have to write even more code to get into Green!

What is the simplest regular expression that can make our new testcase pass? Yes, it’s only two characters, but a nice facepalm:

public function isValid($time_specification)
{
    $hours = '([0-1][0-9]|2[0-3])';
    $minutes = '([0-5][0-9])';
    return preg_match(
        "/^$hours:$minutes.*\$/",
        $time_specification
    ) == 1;
}

Now we accept any bullshit following a 24 hour format time specification. But we can refactor now:

/**
 * @dataProvider provideValidTimeSpecifications
 */
public function testValidTimeSpecificationsAreAccepted($time_specification)
{
    $this->assertTrue($this->validator->isValid($time_specification));
}

public function provideValidTimeSpecifications()
{
    return array(
        'Simple 24 hour time specification' => array('10:42'),
        'Simple 12 hour time specification' => array('10:42 AM'),
    );
}

Now we’re green, but we’ve just invalidated our assertion about the exactly one colon.

Exactly one colon is allowed

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Time specification must not contain colon after minutes' => array('10:42:'),
        'Hours must be numeric' => array('ab:42'),
        'Hours must be less than 24' => array('24:42'),
        'Minutes must be numeric' => array('10:ab'),
        'Minutes must be less than 60' => array('10:60'),
        'Hours must be two-digit' => array('1:42'),
        'Minutes must be two-digit' => array('10:4'),
    );
}

The not so beautiful solution can be something like this:

public function isValid($time_specification)
{
    $hours = '([0-1][0-9]|2[0-3])';
    $minutes = '([0-5][0-9])';
    return preg_match(
        "/^$hours:{$minutes}[^:]*\$/",
        $time_specification
    ) == 1;
}

PHP thought I wanted to refer to $minutes as an array, so I put curly braces around the variable.

Now we’re in Green again, so we can refactor the code. First the tests:

public function provideValidTimeSpecifications()
{
    return array(
        '24 hour time specification' => array('10:42'),
        '12 hour time specification in the morning' => array('10:42 AM'),
        '12 hour time specification in the afternoon' => array('10:42 PM'),
    );
}

Okay, I’m cheating. I added another testcase that I know will be helpful later on, but it still passes with our current production code, so it won’t hurt. Apropo, the production code: a minor refactoring might be to reintroduce the variable we’ve deleted a few minutes ago.

public function isValid($time_specification)
{
    $hours = '([0-1][0-9]|2[0-3])';
    $minutes = '([0-5][0-9])';
    $not_colon = '[^:]*';
    return preg_match(
        "/^$hours:$minutes$not_colon\$/",
        $time_specification
    ) == 1;
}

At least it’s readable. Let’s move on, that $not_colon will not haunt us too long.

Bad idea

I want to get rid of $not_colon as soon as possible, so let’s force the evolution of the regexp into a direction that does not involve $not_colon existing any longer. I want a space after the minutes part of the time specification:

public function provideInvalidTimeSpecifications()
{
    return array(
        'Time specification must contain at least one colon' => array('1042'),
        'Time specification must contain at most one colon' => array('10::42'),
        'Time specification must not contain colon after minutes' => array('10:42:'),
        'Hours must be numeric' => array('ab:42'),
        'Hours must be less than 24' => array('24:42'),
        'Minutes must be numeric' => array('10:ab'),
        'Minutes must be less than 60' => array('10:60'),
        'Hours must be two-digit' => array('1:42'),
        'Minutes must be two-digit' => array('10:4'),
        '12 hour time specification must contain a space after minutes' => array('10:42AM'),
    );
}

The first idea that came to my mind to make that pass was to insert a space before $not_colon in the regexp:

"/^$hours:$minutes $not_colon\$/"

That makes the 24 hour format tests fail, so I deleted both the space and the new testcase. Though the new testcase was a teeny-tiny little step, it’d have required bigger changes in the regular expression, namely to separate the two kinds of hour format. Instead I introduced a more strict testcase in provideInvalidTimeSpecifications() that is comparable to the size of the changes to make it pass while not breaking all the other tests:

'12 hour format must end with AM or PM' => array('10:42 XY'),

The production code to pass that test:

public function isValid($time_specification)
{
    $hours = '([0-1][0-9]|2[0-3])';
    $minutes = '([0-5][0-9])';
    $ampm = '( [AP]M)';
    return preg_match(
        "/^$hours:$minutes$ampm?\$/",
        $time_specification
    ) == 1;
}

Reviewing that now I can see I could have made the original testcase pass by applying the same changes I did for this case, but the test I deleted was not suggesting it. It seems to me that TDD is not about coding skills but about testing skills. Writing good code is one thing, but writing good tests that enforce building up good code step-by-step can be at least as much important.

Differentiating 12 hour and 24 hour formats

Time for a bigger change. A new testcase for the negative tests:

'24 hour format must be exactly 5 characters long' => array('14:42 PM'),

To make that pass, the regular expression must recognize the difference between the two kinds of time formats. It requires matching numbers less than 13, but most of the code needed already exists.

public function isValid($time_specification)
{
    $minutes = '([0-5][0-9])';

    $hours_less_than_24 = '([0-1][0-9]|2[0-3])';
    $time_spec_24 = "$hours_less_than_24:$minutes";

    $hours_less_than13 = '(0[0-9]|1[012])';
    $ampm = '( [AP]M)';
    $time_spec_12 = "$hours_less_than13:$minutes$ampm";

    return preg_match(
        "/^($time_spec_24|$time_spec_12)\$/",
        $time_specification
    ) == 1;
}

It seems to be a big change, but actually it’s quite simple: the question mark has been substituted with the two alternatives it specified: one is hour specification without $ampm, the other is the same, but with $ampm. Of course the second alternative should not match hours greater than 12, so it required a second type of hour matching regular expression. When everything worked, some extractions and variable renames were done.

Attempt to write more failing tests

Are we ready? The code in isValid() seems to contain all the information appearing in the requirements, so there must be only a few small steps remaining. Let’s try adding some new tests. These should not be considered valid:

'Hour part of 12 hour format must be less than 13' => array('13:00 PM'),
'12 hour format must be exactly 8 characters long' => array('12:45  AM'),

They pass without touching the code, we couldn’t manage to get into Red. This means there’s some room for further refactoring. All the dirty details of building our regular expressions can go to a private method buried at the bottom of the class:

class TimeValidator
{
    public function isValid($time_specification)
    {
        return preg_match($this->buildRegExp(), $time_specification) == 1;
    }

    private function buildRegExp()
    {
        $minutes = '([0-5][0-9])';

        $hours_less_than_24 = '([0-1][0-9]|2[0-3])';
        $time_spec_24 = "$hours_less_than_24:$minutes";

        $hours_less_than13 = '(0[0-9]|1[012])';
        $ampm = '( [AP]M)';
        $time_spec_12 = "$hours_less_than13:$minutes$ampm";

        return "/^($time_spec_24|$time_spec_12)\$/";
    }
}

Maybe isValid() is easier to read like this:

public function isValid($time_specification)
{
    return 1 == preg_match($this->buildRegExp(), $time_specification);
}

So is there anything to test? Well, I’m sure PHP has a gotcha for us:

'Multiline strings are not accepted' => array("10:42\n"),

Bang, it fails! The fix is very simple:

return "/^($time_spec_24|$time_spec_12)\$/D";

Sometimes it’s worth reading around PCRE modifiers in PHP documentation, just to make time pass faster. :-)

Leading zeros

One more thing I miss: though the code cares about leading zeros, there’s no testcase mentioning them. So these should be valid:

'12 hour time specification with leading zero' => array('00:42 PM'),
'24 hour time specification with leading zero' => array('00:42'),

Doesn’t that 00:42 PM look weird? I’m not used to 12 hour time format, so I had to think about it for a few seconds: in 12 hour notation there’s no such thing as 00:00. Hours can be between 01 and 12, which means that our two new passing tests should look like:

'12 hour time specification with leading zero' => array('01:42 PM'),
'24 hour time specification with leading zero' => array('00:42'),

(Yes, they pass as expected. Note that I forgot this speciality of 12 hour notation when writing the requirements!)

And the bug can be easily triggered by this negative test:

'12 hour time specification does not allow hours to be zero' => array('00:42 PM'

And the code fixing it:

$hours_less_than13 = '(0[1-9]|1[012])';

The whole code

After adding some corner cases to the list of positive tests, the final results are:

TimeValidator.php

class TimeValidator
{
    public function isValid($time_specification)
    {
        return 1 == preg_match($this->buildRegExp(), $time_specification);
    }

    private function buildRegExp()
    {
        $minutes = '([0-5][0-9])';

        $hours_less_than_24 = '([0-1][0-9]|2[0-3])';
        $time_spec_24 = "$hours_less_than_24:$minutes";

        $hours_less_than13 = '(0[1-9]|1[012])';
        $ampm = '( [AP]M)';
        $time_spec_12 = "$hours_less_than13:$minutes$ampm";

        return "/^($time_spec_24|$time_spec_12)\$/D";
    }
}

TimeValidatorTest.php

class TimeValidatorTest extends PHPUnit_Framework_TestCase
{
    public function setUp()
    {
        $this->validator = new TimeValidator();
    }

    /**
     * @dataProvider provideValidTimeSpecifications
     */
    public function testValidTimeSpecificationsAreAccepted($time_specification)
    {
        $this->assertTrue($this->validator->isValid($time_specification));
    }

    public function provideValidTimeSpecifications()
    {
        return array(
            '24 hour time specification' => array('10:42'),
            '12 hour time specification in the morning' => array('10:42 AM'),
            '12 hour time specification in the afternoon' => array('10:42 PM'),
            '12 hour time specification with leading zero' => array('01:42 PM'),
            '24 hour time specification with leading zero' => array('00:42'),
			'First minute of a 24 hour specification' => array('00:00'),
            'First minute of a 12 hour specification' => array('01:00 AM'),
            'Last 12 hour time specification in the morning' => array('12:59 AM'),
            'Last 12 hour time specification in the afternoon' => array('12:59 PM'),
            'Last 24 hour time specification' => array('23:59'),
        );
    }

    /**
     * @dataProvider provideInvalidTimeSpecifications
     */
    public function testInvalidTimeSpecificationsAreNotAccepted($time_specification)
    {
        $this->assertFalse($this->validator->isValid($time_specification));
    }

    public function provideInvalidTimeSpecifications()
    {
        return array(
            'Time specification must contain at least one colon' => array('1042'),
            'Time specification must contain at most one colon' => array('10::42'),
            'Time specification must not contain colon after minutes' => array('10:42:'),
            'Hours must be numeric' => array('ab:42'),
            'Hours must be less than 24' => array('24:42'),
            'Minutes must be numeric' => array('10:ab'),
            'Minutes must be less than 60' => array('10:60'),
            'Hours must be two-digit' => array('1:42'),
            'Minutes must be two-digit' => array('10:4'),
            '12 hour format must end with AM or PM' => array('10:42 XY'),
            '24 hour format must be exactly 5 characters long' => array('14:42 PM'),
            'Hour part of 12 hour format must be less than 13' => array('13:00 PM'),
            '12 hour format must be exactly 8 characters long' => array('12:45  AM'),
            'Multiline strings are not accepted' => array("10:42\n"),
            '12 hour time specification does not allow hours to be zero' => array('00:42 PM'),
        );
    }

    private $validator;
}

You can check them out at GitHub.

Conclusion

The whole code copied above took about 45-60 minutes to write (including taking notes for this blog post), during which I’ve learnt a lot of things about practising TDD. I’m sure that if I tried to reproduce this experiment from scratch, it would take several steps less while still moving with teeny-tiny steps forward.

Just as any part of code, regular expressions also worth writing a bunch of unit tests that make sure that the code can be changed anytime later, either to fix bugs or to extend functionality. The big advantage of unit tests (besides architectural flexibility, documentation, etc.) is not that they can expose any possible bug that can be imagined but they allow the code to be changed without worrying much about breaking existing, well-tested functionality.

But there are some gotchas in the field as well:

  • There’s no tool to measure coverage for regular expressions that I’m aware of at the moment. The most useful hit I found with Google doesn’t help much either.
  • It’s very easy to overlook bugs and miss testcases during test-driving regular expressions. (When the regexp engine that comes bundled with your language has some gotchas to offer, well, that doesn’t make the situation much better.)
  • One character change in the regular expression may require a bunch of new testcases to be written.