Arabic to Roman Numerals Conversion with PHP and Regex
During this week’s CodeKata, we were asked to write a tool to convert Arabic numbers (1, 5, 100, etc.) into Roman Numerals (I, V, C, etc.). For those unfamiliar with CodeKata, the point of the exercise is not to find a working solution, but to practice our approaches to problem-solving, pair-programming, and TDD.
As a bit of a running joke once the serious exercise was completed, I tried to solve the problem using regular expressions in PHP. This is written in a single method and is certainly not supposed to be an efficient or best-practice way to approach the problem in a production environment.
The very first step is to represent the Arabic number using just the numeral I
, e.g. 1 becomes I
and 18 becomes IIIIIIIIIIIIIIIIII
.
If, for a moment, we concern ourselves just with sorting out the numerals I
, V
, and X
, there are two phases of regex replacement that need to be completed:
Phase 1:
- All occurrences of
IIIII
(5) need to be replaced withV
; - Then, all occurrences of
VV
(10) need to be replaced withX
;
Phase Two:
- All remaining occurrences of
IIII
(4) need to be replaced withIV
; - Then, any instances of
VIV
(9) need to be replaced withIX
;
The pattern of replacements in the phases above are exactly the same if we multiply all the numbers by 10.
For 1 (I
), 5 (V
) and 10 (X
), it’s the same for 10 (X
), 50 (L
) and 100 (C
), and the same for 100 (C
), 500 (D
) and 1000 (M
).
Phase 1 needs to be completed for each of these groups (IVX
, XLC
, CDM
) first, then Phase 2 can be completed for each group afterwards, so that the substitutions are made in the correct order.
In the code below, the first foreach
loop iterates through each of the phases above. The strings in the array each contain four space-separated tokens, representing two find-replace pairs. E.g. /I{5}/ V
will be used to replace IIIII
with V
, and /V{2}/ X
will be used to replace VV
with X
.
The second foreach
loop iterates through our groups of numerals (each a multiple of 10 greater than the last).
We use strtr($p, 'IVX', $r)
to translate all the find-replace tokens in the phase to the correct multiple-of-ten group, as the patterns are identical. E.g. the phase /I{5}/ V /V{2}/ X
will become /X{5}/ L /L{2}/ C
.
The last step after this translation is to explode
the space-separated string and feed the tokens into the correct parameters of preg_replace
.
<?php
namespace Seniorio;
class Numeralizor
{
public function arabicToRoman($n)
{
$n = str_repeat('I', $n);
foreach (array('/I{5}/ V /V{2}/ X', '/I{4}/ IV /VIV/ IX') as $p) {
foreach (array('IVX', 'XLC', 'CDM') as $r) {
$a = explode(' ', strtr($p, 'IVX', $r));
$n = preg_replace(array($a[0], $a[2]), array($a[1], $a[3]), $n);
}
}
return $n;
}
}
Just a bit of fun. You can find the accompanying 3000-line(!) PhpSpec test on GitHub.