Arabic to Roman Numerals Conversion with PHP and Regex

Sun, 11 Aug 2013

During this week’s CodeKata, we were asked to write a tool to convert Arabic numbers (1, 5, 100, etc.) into Roman Numerals (I, V, C, etc.). For those unfamiliar with CodeKata, the point of the exercise is not to find a working solution, but to practice our approaches to problem-solving, pair-programming, and TDD.

As a bit of a running joke once the serious exercise was completed, I tried to solve the problem using regular expressions in PHP. This is written in a single method and is certainly not supposed to be an efficient or best-practice way to approach the problem in a production environment.

The very first step is to represent the Arabic number using just the numeral I, e.g. 1 becomes I and 18 becomes IIIIIIIIIIIIIIIIII.

If, for a moment, we concern ourselves just with sorting out the numerals I, V, and X, there are two phases of regex replacement that need to be completed:

Phase 1:

All occurrences of IIIII (5) need to be replaced with V;
Then, all occurrences of VV (10) need to be replaced with X;

Phase Two:

All remaining occurrences of IIII (4) need to be replaced with IV;
Then, any instances of VIV (9) need to be replaced with IX;

The pattern of replacements in the phases above are exactly the same if we multiply all the numbers by 10.

For 1 (I), 5 (V) and 10 (X), it’s the same for 10 (X), 50 (L) and 100 (C), and the same for 100 (C), 500 (D) and 1000 (M).

Phase 1 needs to be completed for each of these groups (IVX, XLC, CDM) first, then Phase 2 can be completed for each group afterwards, so that the substitutions are made in the correct order.

In the code below, the first foreach loop iterates through each of the phases above. The strings in the array each contain four space-separated tokens, representing two find-replace pairs. E.g. /I{5}/ V will be used to replace IIIII with V, and /V{2}/ X will be used to replace VV with X.

The second foreach loop iterates through our groups of numerals (each a multiple of 10 greater than the last).

We use strtr($p, 'IVX', $r) to translate all the find-replace tokens in the phase to the correct multiple-of-ten group, as the patterns are identical. E.g. the phase /I{5}/ V /V{2}/ X will become /X{5}/ L /L{2}/ C.

The last step after this translation is to explode the space-separated string and feed the tokens into the correct parameters of preg_replace.

<?php

namespace Seniorio;

class Numeralizor
{
    public function arabicToRoman($n)
    {
        $n = str_repeat('I', $n);

        foreach (array('/I{5}/ V /V{2}/ X', '/I{4}/ IV /VIV/ IX') as $p) {
            foreach (array('IVX', 'XLC', 'CDM') as $r) {
                $a = explode(' ', strtr($p, 'IVX', $r));
                $n = preg_replace(array($a[0], $a[2]), array($a[1], $a[3]), $n);
            }
        }

        return $n;
    }
}

Just a bit of fun. You can find the accompanying 3000-line(!) PhpSpec test on GitHub.