Martin Guy after C. Woo
ABSTRACT
A fast algorithm for calculating integer (and hence floating point) square roots is presented, with implementations in some popular and unpopular programming languages.
It all started when I was given a fifteen-row abacus by my grandmother, with a book by a Mr C. Woo[1], a name almost too good to be true, on how to work it to do addition, subtraction, multiplication, division and square and cube roots. I was amazed. With some practice, I could do 10-digit square roots, giving remainder if not exact, in five or ten minutes.
Mr Woo's algorithm on a decimal abacus is a follows:
Once the algorithm is reduced to a binary equivalent, dealing with pairs of bits at a time, nice properties drop out like:
Tests of the integer version in compiled C on the VAX against the VAX floating point one in the new hand-crafted assembly language -lnm show the integer one to be six times as fast as the one in the library.
Remember that incrementing a binary floating point exponent and a 1-bit shift can instantly align a floating point number to a 2-bit boundary. Change in the mantissa is simply achieved by multiplying it by two!
Here is the code in C:
/*
* Square root by abacus algorithm, Martin Guy @ UKC, June 1985.
* From a book on programming abaci by Mr C. Woo.
* Argument is a positive integer, as is result.
*
* I have formally proved that on exit:
* 2 2 2
* res <= x < (res+1) and res + op == x
*
* This is also nine times faster than the library routine (-lm).
*/
int
sqrt(x)
int x;
{
/*
* Logically, these are unsigned. We need the sign bit to test
* whether (op - res - one) underflowed.
*/
register int op, res, one;
op = x;
res = 0;
/* "one" starts at the highest power of four <= than the argument. */
one = 1 << 30; /* second-to-top bit set */
while (one > op) one >>= 2;
while (one != 0) {
if (op >= res + one) {
op = op - (res + one);
res = res + 2 * one;
}
res /= 2;
one /= 4;
}
return(res);
}
Here, in Occam1:
-- Square root engine
--
-- Adapted from C.Woo's magic sqrt.
--
-- Instead of a loop we have a train of loop bodies with a different
-- constant in each instead of a loop variable.
--
-- --- op --- ---
-- arg-->-| 30|-->-| 28|- -| 0 |--> remainder
-- |2 | |2 | ... |2 |
-- 0..-->-| |-->-| |- -| |--> root
-- --- res --- ---
PROC sqrt (CHAN in, out.res, out.rem) =
PROC sqrt.cell(CHAN in.op, in.res, out.op, out.res, VALUE n) =
-- n determines "one", one = 4^n
VAR one, op, res, op2:
SEQ
one := 1 << (2 * n)
WHILE TRUE
SEQ
PAR
in.op ? op
in.res ? res
op2 := (op - res) - one
IF
(op2 >= 0)
PAR
out.op ! op2
out.res ! (res/2) + one
TRUE
PAR
out.op ! op
out.res ! res/2 :
PROC spew(VALUE n, CHAN out) =
-- repeatedly output the given value
WHILE TRUE
out ! n :
DEF word.size = 32: -- number of bits in machine word
DEF ncells = 16: -- number of sqrt cells required
CHAN chan.op [ncells - 1]: -- channels to connect sqrt cells
CHAN chan.res[ncells - 1]:
CHAN chan0: -- spews 0s into the first res input
PAR
-- set up sqrt cells from right to left
sqrt.cell(chan.op[0], chan.res[0], out.rem, out.res, 0)
PAR i = [ 1 FOR (ncells-2) ]
sqrt.cell(chan.op[i], chan.res[i], chan.op[i-1], chan.res[i-1], i)
sqrt.cell(in, chan0, chan.op[ncells-2], chan.res[ncells-2], ncells-1)
spew(0, chan0) :
And in Imp:
%begin
%integer op, res, one
READ(op)
res = 0; one = 1<<30
%cycle
%exit %if one <= op
one = one // 4
%repeat
%cycle
%exit %if one < 1
%if op - res - one >= 0 %start
op = op - res - one
res = res + 2 * one
%finish
res = res // 2
one = one // 4
%repeat
WRITE (res,3)
PRINTSTRING (" rem ")
WRITE (op,3)
%endofprogram
in Miranda (courtesy of DAT):
||integer square root, by Martin Guy's method
square_root :: num->num
square_root x
= loop x 0 one, x>0
= 0, x=0
= error "square_root of negative arg", otherwise
where
one = last(takewhile(<=x)(iterate (*4) 1))
||largest power of 4 <= x
loop :: num->num->num->num
loop op res one
= res, one=0
= loop op' (res' div 2) (one div 4), otherwise
where
(op',res')
= (op-res-one,res+2*one), op >= res+one
= (op,res), otherwise
||This seems to work ok -- DT