[Ab]using PostgreSQL to calculate \(\pi\)¶
Once the crime of binding the GMP multiple precision arithmetic library into PostgreSQL has been perpetrated, using it to calculate one million digits of \(\pi\) is the least evil.
Calculation is performed using a Machin-like formula, a generalization of the Machin’s formula:
known as one of the most efficient ways to compute a large number of digits of \(\pi\). Specifically we will use the Chien-Lih Hwang 1997 formula (further details and references in the above link).
Calculation is performed using the mpz
data type, thus using integers:
to calculate an approximation of \(\tan^{-1}\frac{1}{x}\) to \(n\)
digits, we use a Taylor series scaled to \(10^n\). Full details are
provided in this nice LiteratePrograms article.
It is possible to implement the arccot()
function using a
common table expression from PostgreSQL 8.4:
CREATE OR REPLACE FUNCTION arccot(x mpz, unity mpz)
RETURNS mpz AS
$$
WITH RECURSIVE t(n, xp) AS (
VALUES (3, $2 / -($1 ^ 3))
UNION ALL
SELECT n+2, xp / -($1 ^ 2)
FROM t
WHERE xp <> 0
)
SELECT sum(xp / n) + ($2 / $1) FROM t;
$$
LANGUAGE sql IMMUTABLE STRICT;
However using this method all the temporary terms in the calculation are kept in memory, and the complex bookkeeping results in a non optimal use of the CPU. The performance can be improved using PL/pgSQL, which may be not the fastest language around, but it’s perfectly suitable for this task, as the time spent in the function body is negligible respect to the time spent in the arithmetic functions with huge operators.
CREATE OR REPLACE FUNCTION arccot(x mpz, unity mpz)
RETURNS mpz AS
$$
DECLARE
xp mpz := unity / x;
xp2 mpz := -(x ^ 2);
acc mpz := xp;
n mpz := 3;
term mpz;
BEGIN
LOOP
xp := xp / xp2;
term := xp / n;
EXIT WHEN term = 0;
acc := acc + term;
n := n + 2;
END LOOP;
RETURN acc;
END
$$
LANGUAGE plpgsql IMMUTABLE STRICT;
And the Machin-like formula using it:
CREATE OR REPLACE FUNCTION pi_hwang_97(ndigits int)
RETURNS mpz AS
$$
DECLARE
unity mpz = 10::mpz ^ (ndigits + 10);
BEGIN
RETURN 4 * (
183 * arccot(239, unity)
+ 32 * arccot(1023, unity)
- 68 * arccot(5832, unity)
+ 12 * arccot(110443, unity)
- 12 * arccot(4841182, unity)
- 100 * arccot(6826318, unity)
) / (10::mpz ^ 10);
END
$$
LANGUAGE plpgsql IMMUTABLE STRICT;
This function produces 200,000 digits of \(\pi\) in 30 sec on my laptop (Intel i5 2.53GHz). As a comparison, the equivalent Python implementation (using the Python built-in long integer implementation) takes almost 4 minutes.
The function above has the shortcoming of using only one of the 4 cores available on my machine. But the formula itself is easy to parallelize as it contains 6 independent terms: we can produce each term in a separate backend process, storing the computation into a table (it’s a database, after all!) and composing the result as a last step. The following Python script implements this idea:
import eventlet
eventlet.patcher.monkey_patch()
import sys
import psycopg2
dsn = 'dbname=regression'
nprocs = 4
ndigits = int(sys.argv[1])
cnn = psycopg2.connect(dsn)
cnn.set_isolation_level(0)
cur = cnn.cursor()
cur.execute("""
drop table if exists pi;
create table pi (mult int4, arccot mpz);
""")
def arccot((mult, arg)):
cnn = psycopg2.connect(dsn)
cnn.set_isolation_level(0)
cur = cnn.cursor()
cur.execute("""
insert into pi
values (%s, arccot(%s, 10::mpz ^ (%s + 10)));
""", (mult, arg, ndigits))
cnn.close()
pool = eventlet.GreenPool(nprocs)
list(pool.imap(arccot, [
(183, 239), (32, 1023), (-68, 5832),
(12, 110443), (-12, 4841182), (-100, 6826318)]))
cur.execute("select 4 * sum(mult * arccot) / (10::mpz ^ 10) from pi;")
print cur.fetchone()[0]
The script uses 4 cores concurrently to calculate the intermediate terms: it produces 200,000 digits in about 13 seconds and 1 million digits in 8 minutes and 30 seconds.