Adventures on pidigits benchmark and F#
In my previous post regarding the pidigits benchmark comparison, I’ve been a little unfair towards F#, which I really like. Now it’s time to amend.
Background
It’s all there, on the benchmarksgame site.
Adapt the step-by-step algorithm given on pages 4,6 & 7 of [pdf 156KB] Unbounded Spigot Algorithms for the Digits of Pi. (Not the deliberately obscure version given on page 2. Not the Rabinowitz-Wagon algorithm.)
All right, it’s based on this formula:
And then
And then, quoting the paper, we can see this as which is no other than the composition of a dreadful infinite series of linear fractional transformations or Möbius transformations (you never guessed it.)
These are functions taking $x$ to $\frac{q x+r}{s x+t}$ for integers $q,r,s$ and $t$ with $q t-r s \neq 0$—that is, yielding a ratio of integer-coefficient linear transformations of $x$. Such a transformation can be represented by the four coefficients $q,r,s$ and $t$, and if they are arranged as a matrix $\left(\begin{array}{ll}{q} & {r} \ {s} & {t}\end{array}\right)$ then function composition corresponds to matrix multiplication.
Code
Long story short, a couple of simplifications and factorizations later, we got this beautiful haskell code :
-- The Computer Language Benchmarks Game
-- https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
-- by Don Stewart, Einar Karttunen and Branimir Maksimovic
--
import System.Environment
data F = F !Integer !Integer !Integer !Integer
main = loop 10 0 . flip take (str (F 1 0 0 1) ns) . read . head =<< getArgs
ns = [ F k (4*k+2) 0 (2*k+1) | k <- [1..] ]
loop n s [] = putStrLn $ replicate n ' ' ++ "\t:" ++ show s
loop 0 s xs = putStrLn ("\t:"++show s) >> loop 10 s xs
loop n s (x:xs) = putStr (show x) >> loop (n-1) (s+1) xs
flr x (F q r s t) = (q*x + r) `div` (s*x + t)
comp (F q r s t) (F u v w x) = F (q*u+r*w) (q*v+r*x) (s*u+t*w) (s*v+t*x)
str z (x:xs) | y == flr 4 z = y : str (comp (F 10 (-10*y) 0 1) z) (x:xs)
| otherwise = str (comp z x) xs where y = flr 3 z
Some details
Bear in mind that, with this algorithm, somewhere at the fourth or the fifth decimal we already reached the numeric limit of an unsigned 64 bits integer in z
. So the implementation should handle “no-limit” integers, known as big integers.
Some remarks on the code:
.
is composition in haskell.flip
is just there to exchange arguments in the composition function for thetake
function (don’t bother with it)
More important :
- the type
F
stores 4Integer
(they are big integers in haskell by default) ns
is a infinite loop feeding the algorithmflr
is the digit extractioncomp
does the Möbius thing.str
is the main recursive loop wherez
is the state matrix andx:xs
the pattern-matched input list. It checks if the extracted digit on z has the property of being the next “pi” digit. If so don’t consume input, update the state for base-10, otherwise consume it and do the Möbius-thing next.loop
does the printing, only printing digits byn
-sized groups and printing the number of already computing so far.main
puts everything in motion, gets the command-line (number of asked digits), launchs the recursivestr
loop with the initial state and prints it as it goes with the composedloop
function.
Ok, now let’s do it in F#
First, some utility functions. This is to measure the computation time
let duration f =
let timer = new System.Diagnostics.Stopwatch()
timer.Start()
let oldout = System.Console.Out
System.Console.SetOut(System.IO.TextWriter.Null) // Redirect the output
let returnValue = f()
System.Console.SetOut(oldout)
let sec = (float timer.ElapsedMilliseconds) / 1000.0
sec
And this is to plot a bar chart of the computation times wrt the number of digits
// #load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
open XPlot.Plotly
let plot f range =
range
|> Seq.head
|> (fun x -> duration (fun _ -> f x))
|> ignore // Dry-run the first run to filter out the jit-compilation time on the next ones
let data =
range
|> Seq.map (fun x -> (x,duration (fun _ -> f x)))
let c =
Chart.Bar data
|> Chart.WithXTitle "Time in seconds"
|> Chart.WithYTitle "Number of digits"
c
First attempt, tail recursion woes
type F = F of bigint * bigint * bigint * bigint
let ns =
Seq.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let rec loop = function
| n, s, l when Seq.isEmpty l -> printfn "%s\t:%d" (String.replicate n " ") s
| 0, s, xs -> printfn "\t:%d" s;loop (10,s,xs)
| n, s, l -> printf "%d" (Seq.head l); loop ((n-1),(s+1),(Seq.tail l |> Seq.cache))
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let rec str z l =
let y = flr 3I z
match z,l with
| z, l when y = flr 4I z -> seq { yield (int y); yield! str (comp (F (10I,-10I*y,0I,1I)) z) l }
| z, l -> str (comp z (Seq.head l)) (Seq.tail l |> Seq.cache)
let mainPi n =
ns
|> str (F(1I,0I,0I,1I))
|> Seq.take n
|> Seq.cache
|> (fun x -> loop (10,0,x))
mainPi 27
3141592653 :10
5897932384 :20
6264338 :27
Pretty straightforward from the haskell implementation uh? Except that… if we run
mainPi 5000
we got a StackOverflowException
.
And that’s because the Seq.tail |> Seq.cache
doesn’t trigger the tail-recursion optimization so the cached Seq.tail
are stored on the stack, over and over. And that’s one of our first gotcha in F#: Seq
are NOT lists as in usual functional programming way. Hence, we should ditch the Seq.tail
thing. F#’s List
is eager, so it doesn’t lazy-evaluate in a nice functionnal “pipelining” way. We could define our LazyList<T>
to do that, for a start.
Lazyness (almost) everywhere
Let’s define our LazyList
type with the usual car/cdr semantics, the only difference with a “normal” list type is the “lazy” cdr. Documentation for the lazy evaluation in F#. And add some usual “list” functions for good measure.
type LazyList<'a> =
| Empty // Nil
| Cons of 'a * Lazy<LazyList<'a>> with // Cons (Car, Cdr)
// Generator
static member unfold (f : 's -> ('a*'s) option) (init : 's) : LazyList<'a> =
match f init with
| None -> Empty
| Some (a, s) -> Cons (a, lazy ( LazyList.unfold f s))
// Selecting n first elements
static member take n (l : LazyList<'a>) : LazyList<'a> =
match l with
| Empty -> Empty
| Cons (a, t) -> if n = 0 then Empty
else Cons (a, lazy (LazyList.take (n-1) (t.Force())))
Now we convert all Seq<T>
functions to LasyList<T>
, and we could do a little more pattern matching along the way.
type F = F of bigint * bigint * bigint * bigint
let ns =
LazyList.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let rec loop = function
| n, s, Empty -> printfn "%s\t:%d" (String.replicate n " ") s
| 0, s, xs -> printfn "\t:%d" s;loop (10,s,xs)
| n, s, Cons (x,xs) -> printf "%d" x;loop (n-1,s+1,xs.Force())
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let rec str z l =
let y = flr 3I z
match z,l with
| z, l when y = flr 4I z -> Cons(int y,lazy(str (comp (F (10I,-10I*y,0I,1I)) z) l))
| z, Cons(x,xs) -> str (comp z x) (xs.Force())
| z, Empty -> str z l
let mainPi n =
ns
|> str (F(1I,0I,0I,1I))
|> LazyList.take n
|> (fun x -> loop (10,0,x))
mainPi 27
3141592653 :10
5897932384 :20
6264338 :27
The only real changes from the first version with Seq
is that we now enforce proper tail-recursion via the Lazy<T>.Force()
call.
plot mainPi [1000..1000..10000]
And that’s a very nice improvement over the Seq.tail
version. We’re still leagues away from the haskell performance, more on that later.
No, we’re not that Lazy
Now we’re wondering if we could go a little more F#-idiomatic, without resorting to LazyList
. First, let’s take care of the printing with a nice Seq.scan
Like
Seq.fold
, but computes on-demand and returns the sequence of intermediate and final results.
// Signature:
Seq.scan : ('State -> 'T -> 'State) -> 'State -> seq<'T> -> seq<'State>
// Usage:
Seq.scan folder state source
let unloop n =
let fmt = Printf.StringFormat<string->int->string>("%-" + (string n) + "s\t:%d\n")
Seq.map (fun y -> char (48 + y))
>> Seq.chunkBySize n
>> Seq.scan (fun (s,_) l -> let s = s + Array.length l
(s,sprintf fmt (System.String l) s)) (0,"")
>> Seq.iter (snd >> System.Console.Write)
It first “transforms” digits as bytes, groups them by array of 10 and accumulate them with Seq.scan
with the number of digits with the current output of ten digits nicely formatted as state and finally just write those outputs to the console
(The StringFormat
trick is documented there)
seq [ 3; 4; 4; 1; 2; 4; 5 ] |> unloop 3
344 :3
124 :6
5 :7
Now what if instead of just generating the sequence of k
consumed by the comp
function, we move all the computation inside our Seq.unfold
generator ? As for the unfold function state, we have just to store k
and z
Returns a sequence that contains the elements generated by the given computation.
// Signature:
Seq.unfold : ('State -> ('T * 'State) option) -> 'State -> seq<'T>
// Usage:
Seq.unfold generator state
All we have to do is to store in our state the remaining digits to compute, the k
and the z
accumulator. When we got a digit we “yield” a Some int
otherwise a None. When n
reachs 0 we stop. In the main call, we filter out all the None
values.
let str (n,k,z) = // n : remaining digits to compute, k : to be consumed
let y = flr 3I z
let yt = flr 4I z
match n,y with
| 0,_ -> None
| _,y when y = yt -> Some(Some (int y),(n-1,k ,comp (F(10I,(-10I*y),0I,1I)) z))
| _,_ -> Some(None ,(n ,k+1I,comp z (F(k,(4I*k+2I),0I,(2I*k+1I)))))
let mainPi n =
Seq.unfold str (n,1I,F(1I,0I,0I,1I))
|> Seq.choose id
|> unloop 10
mainPi 27
3141592653 :10
5897932384 :20
6264338 :27
Finally the complete snippet there
type F = F of bigint * bigint * bigint * bigint
let ns =
Seq.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let unloop n =
Seq.map (fun y -> char (48 + y))
>> Seq.chunkBySize n
>> Seq.scan (fun (s,_) l -> let s = s + Array.length l
(s,sprintf "%-10s\t:%d\n" (System.String l) s)) (0,"")
>> Seq.iter (snd >> System.Console.Write)
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let str (n,k,z) = // n : remaining digits to compute, k : to be consumed
let y = flr 3I z
let yt = flr 4I z
match n,y with
| 0,_ -> None
| _,y when y = yt -> Some(Some (int y),(n-1,k ,comp (F(10I,(-10I*y),0I,1I)) z))
| _,_ -> Some(None ,(n ,k+1I,comp z (F(k,(4I*k+2I),0I,(2I*k+1I)))))
let mainPi n =
Seq.unfold str (n,1I,F(1I,0I,0I,1I))
|> Seq.choose id
|> unloop 10
plot mainPi [1000..1000..10000]
And that’s about the best we could get at this without… a native bigint implementation. In actual .net core/.net framework, bigint is managed and NOT native. If we want the “native” bigint, we have to go a little hackish.
Mutable/native land or “when all hell breaks loose”
Let’s get the native library (should work on linux/mac/windows there), doc
#load "Paket.fsx"
Paket.Package
["Math.Gmp.Native.NET"]
#load "Paket.Generated.Refs.fsx"
And now let’s get to business.
open Math.Gmp.Native
// magic to get our mutable bigint
let inline init() =
let mutable result = mpz_t()
gmp_lib.mpz_init(result)
result
// That's our z
let mutable q = init()
let mutable r = init()
let mutable s = init()
let mutable t = init()
// temporaries
let mutable u = init()
let mutable v = init()
let mutable w = init()
// initial value F(1,0,0,1)
let reset() =
gmp_lib.mpz_set_si(q,1)
gmp_lib.mpz_set_si(r,0)
gmp_lib.mpz_set_si(s,0)
gmp_lib.mpz_set_si(t,1)
// z <- comp x z
let inline compR bq br bs bt =
gmp_lib.mpz_mul_si(u, r, bs)
gmp_lib.mpz_mul_si(r, r, bq)
gmp_lib.mpz_mul_si(v, t, br)
gmp_lib.mpz_add(r, r, v)
gmp_lib.mpz_mul_si(t, t, bt)
gmp_lib.mpz_add(t, t, u)
gmp_lib.mpz_mul_si(s, s, bt)
gmp_lib.mpz_mul_si(u, q, bs)
gmp_lib.mpz_add(s, s, u)
gmp_lib.mpz_mul_si(q, q, bq)
// z <- comp z x
let inline compL bq br bs bt =
gmp_lib.mpz_mul_si(r, r, bt)
gmp_lib.mpz_mul_si(u, q, br)
gmp_lib.mpz_add(r, r, u)
gmp_lib.mpz_mul_si(u, t, bs)
gmp_lib.mpz_mul_si(t, t, bt)
gmp_lib.mpz_mul_si(v, s, br)
gmp_lib.mpz_add(t, t, v)
gmp_lib.mpz_mul_si(s, s, bq)
gmp_lib.mpz_add(s, s, u)
gmp_lib.mpz_mul_si(q, q, bq)
// flr j z
let inline flr j =
gmp_lib.mpz_mul_si(u, q, j)
gmp_lib.mpz_add(u, u, r)
gmp_lib.mpz_mul_si(v, s , j)
gmp_lib.mpz_add(v, v, t)
gmp_lib.mpz_tdiv_q(w, u, v)
gmp_lib.mpz_get_si(w)
What’s all this fuss is about? Now we store as mutable (native) variables the four bigint of our former z
as a global state. In order to mutate this z
-state with the result of the comp
function, we have to provide two differents mutating functions, depending on where z
is in the comp
arguments list, on the right or on the left. The u,v,w
are temporaries. The native functions gmp_lib.mpz_[add,mul,tdiv,get]
are always storing the results in one of their arguments (generally, the first one, passed by reference), that’s a pretty standard API for C
libraries.
Now let’s modify a bit our (un)folder function to account for mutation side effects for z
.
let str (n,k) = // n : remaining digits to compute, k : to be consumed
let y = flr 3
let yt = flr 4
match n,y with
| 0,_ -> None
| _,y when y = yt -> compR 10 (-10*y) 0 1 ;Some(Some (int y),(n-1,k))
| _,_ -> compL k (4*k+2) 0 (2*k+1);Some(None ,(n ,k+1))
let mainPi n =
reset()
Seq.unfold str (n,1)
|> Seq.choose id
|> unloop 10
mainPi 27
3141592653 :10
5897932384 :20
6264338 :27
plot mainPi [1000..1000..10000]
Now, performance-wise, we’re in the same ballpark than the top-tier languages, just something like 2x the pure C/C++ timings.
Final touch
Let’s get back to a sequence expression for the sake of comparison
let rec str k =
seq {
let y = flr 3
let yt = flr 4
match y with
| y when y = yt -> compR 10 (-10*y) 0 1 ;yield y; yield! str k
| _ -> compL k (4*k+2) 0 (2*k+1); yield! str (k+1)
}
let mainPi n =
reset()
str 1
|> Seq.take n
|> unloop 10
mainPi 27
3141592653 :10
5897932384 :20
6264338 :27
plot mainPi [1000..1000..10000]
Conclusion
The road from haskell to F#-idiomatic and F#-pragmatic (native collided) is a little bumpy one. We could import high-performance “native” code and still get a nice FP touch over it, thanks to the pragmatic approach of the language.