# Adventures on pidigits benchmark and F#

In my previous post regarding the pidigits benchmark comparison, I’ve been a little unfair towards F#, which I really like. Now it’s time to amend.

## Background

It’s all there, on the benchmarksgame site.

Adapt the step-by-step algorithm given on pages 4,6 & 7 of [pdf 156KB] Unbounded Spigot Algorithms for the Digits of Pi. (Not the deliberately obscure version given on page 2. Not the Rabinowitz-Wagon algorithm.)

All right, it’s based on this formula:

And then

And then, quoting the paper, we can see this as
which is no other than the composition of a dreadful *inﬁnite series of linear fractional transformations or Möbius transformations* (you never guessed it.)

These are functions taking $x$ to $\frac{q x+r}{s x+t}$ for integers $q,r,s$ and $t$ with $q t-r s \neq 0$—that is, yielding a ratio of integer-coefﬁcient linear transformations of $x$. Such a transformation can be represented by the four coefﬁcients $q,r,s$ and $t$, and if they are arranged as a matrix $\left(\begin{array}{ll}{q} & {r} \ {s} & {t}\end{array}\right)$ then function composition corresponds to matrix multiplication.

## Code

Long story short, a couple of simplifications and factorizations later, we got this beautiful haskell code :

```
-- The Computer Language Benchmarks Game
-- https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
-- by Don Stewart, Einar Karttunen and Branimir Maksimovic
--
import System.Environment
data F = F !Integer !Integer !Integer !Integer
main = loop 10 0 . flip take (str (F 1 0 0 1) ns) . read . head =<< getArgs
ns = [ F k (4*k+2) 0 (2*k+1) | k <- [1..] ]
loop n s [] = putStrLn $ replicate n ' ' ++ "\t:" ++ show s
loop 0 s xs = putStrLn ("\t:"++show s) >> loop 10 s xs
loop n s (x:xs) = putStr (show x) >> loop (n-1) (s+1) xs
flr x (F q r s t) = (q*x + r) `div` (s*x + t)
comp (F q r s t) (F u v w x) = F (q*u+r*w) (q*v+r*x) (s*u+t*w) (s*v+t*x)
str z (x:xs) | y == flr 4 z = y : str (comp (F 10 (-10*y) 0 1) z) (x:xs)
| otherwise = str (comp z x) xs where y = flr 3 z
```

## Some details

Bear in mind that, with this algorithm, somewhere at the fourth or the fifth decimal we already reached the numeric limit of an unsigned 64 bits integer in `z`

. So the implementation should handle “no-limit” integers, known as *big integers*.

Some remarks on the code:

`.`

is composition in haskell.`flip`

is just there to exchange arguments in the composition function for the`take`

function (don’t bother with it)

More important :

- the type
`F`

stores 4`Integer`

(they are big integers in haskell by default) `ns`

is a infinite loop feeding the algorithm`flr`

is the digit extraction`comp`

does the Möbius thing.`str`

is the main recursive loop where`z`

is the state matrix and`x:xs`

the pattern-matched input list. It checks if the extracted digit on z has the property of being the next “pi” digit. If so don’t consume input, update the state for base-10, otherwise consume it and do the Möbius-thing next.`loop`

does the printing, only printing digits by`n`

-sized groups and printing the number of already computing so far.`main`

puts everything in motion, gets the command-line (number of asked digits), launchs the recursive`str`

loop with the initial state and prints it as it goes with the composed`loop`

function.

## Ok, now let’s do it in F#

First, some utility functions. This is to measure the computation time

```
let duration f =
let timer = new System.Diagnostics.Stopwatch()
timer.Start()
let oldout = System.Console.Out
System.Console.SetOut(System.IO.TextWriter.Null) // Redirect the output
let returnValue = f()
System.Console.SetOut(oldout)
let sec = (float timer.ElapsedMilliseconds) / 1000.0
sec
```

And this is to plot a bar chart of the computation times wrt the number of digits

```
// #load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
open XPlot.Plotly
let plot f range =
range
|> Seq.head
|> (fun x -> duration (fun _ -> f x))
|> ignore // Dry-run the first run to filter out the jit-compilation time on the next ones
let data =
range
|> Seq.map (fun x -> (x,duration (fun _ -> f x)))
let c =
Chart.Bar data
|> Chart.WithXTitle "Time in seconds"
|> Chart.WithYTitle "Number of digits"
c
```

### First attempt, tail recursion woes

```
type F = F of bigint * bigint * bigint * bigint
let ns =
Seq.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let rec loop = function
| n, s, l when Seq.isEmpty l -> printfn "%s\t:%d" (String.replicate n " ") s
| 0, s, xs -> printfn "\t:%d" s;loop (10,s,xs)
| n, s, l -> printf "%d" (Seq.head l); loop ((n-1),(s+1),(Seq.tail l |> Seq.cache))
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let rec str z l =
let y = flr 3I z
match z,l with
| z, l when y = flr 4I z -> seq { yield (int y); yield! str (comp (F (10I,-10I*y,0I,1I)) z) l }
| z, l -> str (comp z (Seq.head l)) (Seq.tail l |> Seq.cache)
let mainPi n =
ns
|> str (F(1I,0I,0I,1I))
|> Seq.take n
|> Seq.cache
|> (fun x -> loop (10,0,x))
```

```
mainPi 27
```

```
3141592653 :10
5897932384 :20
6264338 :27
```

Pretty straightforward from the haskell implementation uh? Except that… if we run

```
mainPi 5000
```

we got a `StackOverflowException`

.

And that’s because the `Seq.tail |> Seq.cache`

doesn’t trigger the tail-recursion optimization so the cached `Seq.tail`

are stored on the stack, over and over. And that’s one of our first gotcha in F#: `Seq`

are *NOT* lists as in usual functional programming way. Hence, we should ditch the `Seq.tail`

thing. F#’s `List`

is eager, so it doesn’t lazy-evaluate in a nice functionnal “pipelining” way. We could define our `LazyList<T>`

to do that, for a start.

### Lazyness (almost) everywhere

Let’s define our `LazyList`

type with the usual car/cdr semantics, the only difference with a “normal” list type is the “lazy” cdr. Documentation for the lazy evaluation in F#. And add some usual “list” functions for good measure.

```
type LazyList<'a> =
| Empty // Nil
| Cons of 'a * Lazy<LazyList<'a>> with // Cons (Car, Cdr)
// Generator
static member unfold (f : 's -> ('a*'s) option) (init : 's) : LazyList<'a> =
match f init with
| None -> Empty
| Some (a, s) -> Cons (a, lazy ( LazyList.unfold f s))
// Selecting n first elements
static member take n (l : LazyList<'a>) : LazyList<'a> =
match l with
| Empty -> Empty
| Cons (a, t) -> if n = 0 then Empty
else Cons (a, lazy (LazyList.take (n-1) (t.Force())))
```

Now we convert all `Seq<T>`

functions to `LasyList<T>`

, and we could do a little more pattern matching along the way.

```
type F = F of bigint * bigint * bigint * bigint
let ns =
LazyList.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let rec loop = function
| n, s, Empty -> printfn "%s\t:%d" (String.replicate n " ") s
| 0, s, xs -> printfn "\t:%d" s;loop (10,s,xs)
| n, s, Cons (x,xs) -> printf "%d" x;loop (n-1,s+1,xs.Force())
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let rec str z l =
let y = flr 3I z
match z,l with
| z, l when y = flr 4I z -> Cons(int y,lazy(str (comp (F (10I,-10I*y,0I,1I)) z) l))
| z, Cons(x,xs) -> str (comp z x) (xs.Force())
| z, Empty -> str z l
let mainPi n =
ns
|> str (F(1I,0I,0I,1I))
|> LazyList.take n
|> (fun x -> loop (10,0,x))
```

```
mainPi 27
```

```
3141592653 :10
5897932384 :20
6264338 :27
```

The only real changes from the first version with `Seq`

is that we now enforce proper tail-recursion via the `Lazy<T>.Force()`

call.

```
plot mainPi [1000..1000..10000]
```

And that’s a very nice improvement over the `Seq.tail`

version. We’re still leagues away from the haskell performance, more on that later.

## No, we’re not that Lazy

Now we’re wondering if we could go a little more F#-idiomatic, without resorting to `LazyList`

. First, let’s take care of the printing with a nice `Seq.scan`

Like

`Seq.fold`

, but computes on-demand and returns the sequence of intermediate and final results.

```
// Signature:
Seq.scan : ('State -> 'T -> 'State) -> 'State -> seq<'T> -> seq<'State>
// Usage:
Seq.scan folder state source
```

```
let unloop n =
let fmt = Printf.StringFormat<string->int->string>("%-" + (string n) + "s\t:%d\n")
Seq.map (fun y -> char (48 + y))
>> Seq.chunkBySize n
>> Seq.scan (fun (s,_) l -> let s = s + Array.length l
(s,sprintf fmt (System.String l) s)) (0,"")
>> Seq.iter (snd >> System.Console.Write)
```

It first “transforms” digits as bytes, groups them by array of 10 and accumulate them with `Seq.scan`

with the number of digits with the current output of ten digits nicely formatted as state and finally just write those outputs to the console
(The `StringFormat`

trick is documented there)

```
seq [ 3; 4; 4; 1; 2; 4; 5 ] |> unloop 3
```

```
344 :3
124 :6
5 :7
```

Now what if instead of just generating the sequence of `k`

consumed by the `comp`

function, we move all the computation inside our `Seq.unfold`

generator ? As for the unfold function state, we have just to store `k`

and `z`

Returns a sequence that contains the elements generated by the given computation.

```
// Signature:
Seq.unfold : ('State -> ('T * 'State) option) -> 'State -> seq<'T>
// Usage:
Seq.unfold generator state
```

All we have to do is to store in our state the remaining digits to compute, the `k`

and the `z`

accumulator. When we got a digit we “yield” a `Some int`

otherwise a None. When `n`

reachs 0 we stop. In the main call, we filter out all the `None`

values.

```
let str (n,k,z) = // n : remaining digits to compute, k : to be consumed
let y = flr 3I z
let yt = flr 4I z
match n,y with
| 0,_ -> None
| _,y when y = yt -> Some(Some (int y),(n-1,k ,comp (F(10I,(-10I*y),0I,1I)) z))
| _,_ -> Some(None ,(n ,k+1I,comp z (F(k,(4I*k+2I),0I,(2I*k+1I)))))
let mainPi n =
Seq.unfold str (n,1I,F(1I,0I,0I,1I))
|> Seq.choose id
|> unloop 10
```

```
mainPi 27
```

```
3141592653 :10
5897932384 :20
6264338 :27
```

Finally the complete snippet there

```
type F = F of bigint * bigint * bigint * bigint
let ns =
Seq.unfold (fun k -> Some(F (k, 4I * k + 2I, 0I, 2I*k + 1I),k+1I) ) 1I
let unloop n =
Seq.map (fun y -> char (48 + y))
>> Seq.chunkBySize n
>> Seq.scan (fun (s,_) l -> let s = s + Array.length l
(s,sprintf "%-10s\t:%d\n" (System.String l) s)) (0,"")
>> Seq.iter (snd >> System.Console.Write)
let flr x (F (q,r,s,t)) = (q*x + r) / (s*x + t)
let comp (F (q,r,s,t)) (F (u,v,w,x)) = F (q*u + r*w, q*v + r*x, s*u + t*w, s*v + t*x)
let str (n,k,z) = // n : remaining digits to compute, k : to be consumed
let y = flr 3I z
let yt = flr 4I z
match n,y with
| 0,_ -> None
| _,y when y = yt -> Some(Some (int y),(n-1,k ,comp (F(10I,(-10I*y),0I,1I)) z))
| _,_ -> Some(None ,(n ,k+1I,comp z (F(k,(4I*k+2I),0I,(2I*k+1I)))))
let mainPi n =
Seq.unfold str (n,1I,F(1I,0I,0I,1I))
|> Seq.choose id
|> unloop 10
```

```
plot mainPi [1000..1000..10000]
```

And that’s about the best we could get at this without… a native bigint implementation. In actual .net core/.net framework, bigint is managed and NOT native. If we want the “native” bigint, we have to go a little hackish.

## Mutable/native land or “when all hell breaks loose”

Let’s get the native library (should work on linux/mac/windows there), doc

```
#load "Paket.fsx"
Paket.Package
["Math.Gmp.Native.NET"]
#load "Paket.Generated.Refs.fsx"
```

And now let’s get to business.

```
open Math.Gmp.Native
// magic to get our mutable bigint
let inline init() =
let mutable result = mpz_t()
gmp_lib.mpz_init(result)
result
// That's our z
let mutable q = init()
let mutable r = init()
let mutable s = init()
let mutable t = init()
// temporaries
let mutable u = init()
let mutable v = init()
let mutable w = init()
// initial value F(1,0,0,1)
let reset() =
gmp_lib.mpz_set_si(q,1)
gmp_lib.mpz_set_si(r,0)
gmp_lib.mpz_set_si(s,0)
gmp_lib.mpz_set_si(t,1)
// z <- comp x z
let inline compR bq br bs bt =
gmp_lib.mpz_mul_si(u, r, bs)
gmp_lib.mpz_mul_si(r, r, bq)
gmp_lib.mpz_mul_si(v, t, br)
gmp_lib.mpz_add(r, r, v)
gmp_lib.mpz_mul_si(t, t, bt)
gmp_lib.mpz_add(t, t, u)
gmp_lib.mpz_mul_si(s, s, bt)
gmp_lib.mpz_mul_si(u, q, bs)
gmp_lib.mpz_add(s, s, u)
gmp_lib.mpz_mul_si(q, q, bq)
// z <- comp z x
let inline compL bq br bs bt =
gmp_lib.mpz_mul_si(r, r, bt)
gmp_lib.mpz_mul_si(u, q, br)
gmp_lib.mpz_add(r, r, u)
gmp_lib.mpz_mul_si(u, t, bs)
gmp_lib.mpz_mul_si(t, t, bt)
gmp_lib.mpz_mul_si(v, s, br)
gmp_lib.mpz_add(t, t, v)
gmp_lib.mpz_mul_si(s, s, bq)
gmp_lib.mpz_add(s, s, u)
gmp_lib.mpz_mul_si(q, q, bq)
// flr j z
let inline flr j =
gmp_lib.mpz_mul_si(u, q, j)
gmp_lib.mpz_add(u, u, r)
gmp_lib.mpz_mul_si(v, s , j)
gmp_lib.mpz_add(v, v, t)
gmp_lib.mpz_tdiv_q(w, u, v)
gmp_lib.mpz_get_si(w)
```

What’s all this fuss is about? Now we store as mutable (native) variables the four bigint of our former `z`

as a global state. In order to mutate this `z`

-state with the result of the `comp`

function, we have to provide two differents mutating functions, depending on where `z`

is in the `comp`

arguments list, on the right or on the left. The `u,v,w`

are temporaries. The native functions `gmp_lib.mpz_[add,mul,tdiv,get]`

are always storing the results in one of their arguments (generally, the first one, passed by reference), that’s a pretty standard API for `C`

libraries.

Now let’s modify a bit our (un)folder function to account for mutation side effects for `z`

.

```
let str (n,k) = // n : remaining digits to compute, k : to be consumed
let y = flr 3
let yt = flr 4
match n,y with
| 0,_ -> None
| _,y when y = yt -> compR 10 (-10*y) 0 1 ;Some(Some (int y),(n-1,k))
| _,_ -> compL k (4*k+2) 0 (2*k+1);Some(None ,(n ,k+1))
let mainPi n =
reset()
Seq.unfold str (n,1)
|> Seq.choose id
|> unloop 10
```

```
mainPi 27
```

```
3141592653 :10
5897932384 :20
6264338 :27
```

```
plot mainPi [1000..1000..10000]
```

Now, performance-wise, we’re in the same ballpark than the top-tier languages, just something like 2x the pure C/C++ timings.

## Final touch

Let’s get back to a sequence expression for the sake of comparison

```
let rec str k =
seq {
let y = flr 3
let yt = flr 4
match y with
| y when y = yt -> compR 10 (-10*y) 0 1 ;yield y; yield! str k
| _ -> compL k (4*k+2) 0 (2*k+1); yield! str (k+1)
}
let mainPi n =
reset()
str 1
|> Seq.take n
|> unloop 10
```

```
mainPi 27
```

```
3141592653 :10
5897932384 :20
6264338 :27
```

```
plot mainPi [1000..1000..10000]
```

## Conclusion

The road from haskell to F#-idiomatic and F#-pragmatic (native collided) is a little bumpy one. We could import high-performance “native” code and still get a nice FP touch over it, thanks to the pragmatic approach of the language.