 # Finding the Nth Root

28 Apr 2016, 16:06 UTC

## The Nth-root Algorithm

The Nth-root Algorithm is described on Wikipedia. It requires an initial guess, and then Newton-Raphson iterations are taken to improve that guess.

For instance, to refine a cube root, we take an initial guess and then apply this refinement procedure to it:

``````float refine3(float x, float est)
{
return (1.0f/3.0f)*(2*est + x/(est*est));
}
``````

Where `x` is the value we're finding the cube-root of, and `est` is the current estimate. The refined estimate is returned.

Newton-Raphson generally doubles the number of valid bits in each iteration. So if we have only 3 valid bits (relative error < 12.5%) then we only need to apply NR three times to refine it from 3 -> 6 -> 12 -> 24 bits, which is the limit of precision for a 32-bit float.

But how do we obtain a good initial estimate that's no more than 12.5% out?

That is the main subject of this article, and my solution requires floating-point bit-twiddling hacks.

## The integer representation of floats

It's easy to grab the integer representation of a float in C++, and to convert back from an integer representation to a float (in assembly, it's even easier!):

``````union IntFloat {
int i;
float f;
};

int getIntFromFloat(float f) {
IntFloat u;
u.f = f;
return u.i;
}

float getFloatFromInt(int i) {
IntFloat u;
u.i = i;
return u.f;
}
``````

(Note that because of C++ strict-aliasing rules, this is the only correct way to perform this conversion.)

What does the format look like? Well, the bit layout of a 32-bit float is as follows:

``````SEEEEEEE EMMMMMMM MMMMMMMM MMMMMMMM

S = sign (0 = +ve, 1 = -ve)
E = exponent, biased by 127
M = 23-bit mantissa

Represents 2^(E-127)*1.MMMMMMMMMMMMMMMMMMMMMMM
``````

When E=127, this simply represents the binary fraction 1.MMMMMMMMMMMMMMMMMMMMMMM, which is a number between 1 (M=0) and just slightly under 2 (M=11111111111111111111111).

The representation for the "slightly under 2" number is 00111111 11111111 11111111 11111111. If we add 1 to this we get the representation for the next number: 01000000 00000000 00000000 00000000. This has E=128, M=0, and represents the number 2: 2^1*1.00000000000000000000000 = 10.0000000000000000000000. In general, float with E=128 represents numbers between 2 (M=0) and just slightly under 4 (M=11111111111111111111111).

E=129 then represents numbers between 4 and slightly under 8, and so on.

Over the whole range, this is a continuous mapping, with the represented number going up in steps each time M increments, until M wraps, E increments, and then the step size doubles (so the range for that exponent is also doubled, when covering all possible values of M).

Not only is it a continuous mapping, but every time M goes to 0, the value of the integer representation of x is *exactly* (log2(x) + 127) << 23. Which means that (modulo the sign bit, and the exponent bias of 127) the integer representation of a float is *approximately* equal to a s9:23 fixed-point representation of log2(x). The approximation is exact whenever M=0. Effectively we're taking the curve of log2(x), taking points on that curve at integer locations (in y), and drawing straight-line segments between those points. The approximation is globally very good (viewed from a distance, the two curves blur together) but locally there are small errors as the straight line doesn't represent the actual log2 curve when viewed close-up.

The following code masks off the sign bit and fixes the exponent bias, giving an s9:23 fixed-point estimate for log2(|f|) (log2(f) isn't defined for f <= 0):

``````int log2_approx_9_23 = (getIntFromFloat(f) & 0x7FFFFFFF) - 0x3F800000;
``````

We can also go the other way. Given an approximate s9.23 log like this, we can go back to a float. This is the exact inverse of the above operation; we are approximating the function pow(2.0f, i / float(1 << 23)):

``````float pow2_approx_from_9_23 = getFloatFromInt(i + 0x3F800000);
``````

It's now very easy to approximate the Nth root of a float:

``````int log2 = (getIntFromFloat(f) & 0x7FFFFFFF) - 0x3F800000;
int log2_nthroot = log2 / N;
return getFloatFromInt(log2_nthroot + 0x3F800000);
``````

And that's all there is to it! There are no range issues - denormals might not be very accurate, including zero, and infinities and NaNs give finite Nth roots here. But for normalized floats the overall float range is compressed, so we don't have to worry about integer overflow on any of these operations.

We can refine this a little with sign bits (if N is even, we should take the Nth root of |x| anyway; if N is odd we should save the sign of x, and then restore it to the result, so e.g. cube root of -8 gives -2.)

The astounding thing is how well this works. Over the range of all normalized floats (E=0 and E=255 are special cases, which we ignore) the error in the approximation is only about 6%, regardless of the value of N. This means we have 4 good bits in the result, so applying Newton-Raphson refinement three times gives us full float precision (modulo rounding errors in the NR algorithm itself).

In fact, we can be even more approximate. For cube-root, instead of dividing by 3, we can approximate a divide by 3 as follows:

``````log2_nthroot = (log2 >> 20) * 0x55555;
``````

Amazingly this hardly affects the accuracy at all! Similar approximations can be made for other Nth roots than 3.

Finally, instead of doing `(int - 0x3F800000) / 3 + 0x3F800000` we can notice that this is equal to `int/3 + 2*0x3F800000/3` = `int/3 + 0x2A555555`.

Our reduced algorithm (for cube-root) is then:

``````float cuberootapprox(float x)
{
int i = getIntFromFloat(x);
int sign = i & 0x80000000;
i &= 0x7FFFFFFF;
i = (i >> 20) * 0x55555;
i += 0x2A555555;
return floatFromInt(i | sign);
}
``````

## Denormals

A denormal is a number not in the above format. For a denormal, E = 0, and the number has no "hidden 1 bit":

``````S0000000 0MMMMMMM MMMMMMMM MMMMMMMM

S = sign (0 = +ve, 1 = -ve)
M = 23-bit mantissa

Represents 2^-126*0.MMMMMMMMMMMMMMMMMMMMMMM
``````

This is essentially the segment E = 1, continued backwards without changing the step size, until it hits zero. The smallest possible non-zero value has M=1 and this has value 2^-149.

We probably don't care about denormals very much, except for zero. We would like the Nth root of zero to be zero. The following simple fix flushes all denormals to zero:

``````float cuberootapprox(float x)
{
int i = getIntFromFloat(x);
int sign = i & 0x80000000;
i &= 0x7FFFFFFF;
if (i <= 0x007FFFFF) {
return 0;
} else {
i = (i >> 20) * 0x55555;
i += 0x2A555555;
return floatFromInt(i | sign);
}
}
``````

Alternatively we can handle denormals properly. To do this we use the same estimate, but on the number M instead of the number x. x is M * 2^-149, so we then multiply the estimate from M by the Nth root of 2^-149. For a cuberoot this is equal to cbrt(2)*cbrt(2^-150) = cbrt(2)*2^-50:

``````float cuberootapprox(float x)
{
int i = getIntFromFloat(x);
int sign = i & 0x80000000;
i &= 0x7FFFFFFF;
if (!i) {
return 0;
} else if (i <= 0x007FFFFF) {
i = getIntFromFloat(float(i));
i = (i >> 20) * 0x55555;
i += 0x2A555555;
i -= 50 << 23;
return floatFromInt(i | sign) * 1.2599210498948731647672106072782f;
} else {
i = (i >> 20) * 0x55555;
i += 0x2A555555;
return floatFromInt(i | sign);
}
}
``````

For SIMDification, all these branches can be combined into one basic sequence (note that 0x11555555 = 0x2A555555 - (50 << 23)):

``````float cuberootapprox(float x)
{
int i = getIntFromFloat(x);
int sign = i & 0x80000000;
i &= 0x7FFFFFFF;
bool denormal = (i <= 0x007FFFFF);
bool zero = (i == 0);
unsigned int adder = denormal ? 0x11555555 : 0x2A555555;
float multiplier = denormal ? 1.2599210498948731647672106072782f : 1;
multiplier = zero ? 0 : multiplier;
i = denormal ? getIntFromFloat(i) : i;
i = (i >> 20) * 0x55555;
return floatFromInt(i | sign) * multiplier;
}
``````

And, since the error is always less than 6%, we can now get an accurate result just by blindly applying Newton-Raphson three times:

``````float cuberoot(float x)
{
float est = cuberootapprox(x);
est = refine3(x, est);
est = refine3(x, est);
est = refine3(x, est);
return est;
}
``````

This gives "OK" results for very small denormals (worst-case relative error 0.000115%) but for all other floating-point numbers it gives a worst-case error of around 0.000045%. This is not as good as the C library sequence pow(x, 1.0f/3.0f), but it is only a few LSBs different, which is good enough for most purposes, and it is fast - especially if implemented using SIMD to act on 4 or 8 floats at a time.

Similar reduced algorithms can be determined for any Nth-root. The only differences are:

1. Nth root of 2^-149 must be split into a 2^-P part and a part between 1 and 2. P is rolled into the constant 0x11555555 = 0x2A555555 - (P << 23). The fractional part is rolled into the multiplier.
2. (i >> 20) * 0x55555 must be changed to approximate 1/N instead of 1/3. Or just use i/N.

And there you have it. A similar approximation can also be made for reciprocal roots (pow(x, -1/N)) - now the format would be `i = adder - i/N`. If you look at the famous Quake inverse square root approximation, this is of the same form, except `adder` is hand-tweaked to absolutely minimize the estimation error - this is required since Quake only did one iteration of NR, instead of the required three. With three iterations, we don't need to worry about this refinement, but the basic algorithm is fast enough that you could do a brute-force search for an optimal value here in a few hours on a desktop PC, if that's the kind of thing that interests you!

Â¿CuÃ¡l es la diferencia que sientes en tus axilas entre la maÃ±ana y la tarde?En las maÃ±anas las siento limpias, frescas y con un rico oloren cambio en la tarde se siente pegote, ya no con el mismo olor, ni frescas, y supongo que tampoco estarÃ¡n liQsÂap.i¿mue sensaciÃ³n te gustarÃ­a sentir en tus axilas al final del dÃ­a?Me gustarÃ­a sentir lo mismo que en la maÃ±ana, que el olor ni la frescura se desvanezca y bueno no sentirlas pegotes. commented:

Â¿CuÃ¡l es la diferencia que sientes en tus axilas entre la maÃ±ana y la tarde?En las maÃ±anas las siento limpias, frescas y con un rico oloren cambio en la tarde se siente pegote, ya no con el mismo olor, ni frescas, y supongo que tampoco estarÃ¡n liQsÂap.i¿mue sensaciÃ³n te gustarÃ­a sentir en tus axilas al final del dÃ­a?Me gustarÃ­a sentir lo mismo que en la maÃ±ana, que el olor ni la frescura se desvanezca y bueno no sentirlas pegotes.

on 09 May 2016, 09:38 UTC

The spear laundry basket would really help clean up our bathroom <a href="http://hbgwzbhpwcj.com">siuitaton,</a> as our dirty clothes are currently lumped in a pile or put in a rather unsightly navy laundry bag. This new one would give structure and style to our laundry situation! commented:

The spear laundry basket would really help clean up our bathroom <a href="http://hbgwzbhpwcj.com">siuitaton,</a> as our dirty clothes are currently lumped in a pile or put in a rather unsightly navy laundry bag. This new one would give structure and style to our laundry situation!

on 10 May 2016, 08:11 UTC

Tycker att du Ã¤r en otroligt hÃ¤rlig fÃ¶rebild fÃ¶r oss kvinnor!! Man blir vÃ¤ldigt insprreiad och motiverad till att gÃ¥ sin egen vÃ¤g.. Ã–nskar dig lycka till med ditt Ã¤ventyr http://qtezmasohs.com [url=http://clnpsnj.com]clnpsnj[/url] [link=http://spunxy.com]spunxy[/link] commented:

Tycker att du Ã¤r en otroligt hÃ¤rlig fÃ¶rebild fÃ¶r oss kvinnor!! Man blir vÃ¤ldigt insprreiad och motiverad till att gÃ¥ sin egen vÃ¤g.. Ã–nskar dig lycka till med ditt Ã¤ventyr http:qtezmasohs.com [url=http:clnpsnj.com]clnpsnj[/url] [link=http://spunxy.com]spunxy[/link]

on 11 May 2016, 16:35 UTC

hOpCOS http://www.y7YwKx7Pm6OnyJvolbcwrWdoEnRF29pb.com commented:

hOpCOS http://www.y7YwKx7Pm6OnyJvolbcwrWdoEnRF29pb.com

on 12 May 2016, 12:53 UTC

Normally I <a href="http://ztxbrouqmw.com">do8n1&2#7;t</a> learn article on blogs, but I would like to say that this write-up very pressured me to take a look at and do so! Your writing taste has been amazed me. Thank you, very great post. commented:

Normally I <a href="http://ztxbrouqmw.com">do8n1&2#7;t</a> learn article on blogs, but I would like to say that this write-up very pressured me to take a look at and do so! Your writing taste has been amazed me. Thank you, very great post.

on 13 May 2016, 06:00 UTC

This story is hilarious, mostly because it&#8217;s so realistic. The guy pushing up on the girlfriend is very relatable since I&#8217;ve had similar situations happen to me and people I know. Some people only understand a good ass-kicking! Also: who do1se&#82n7;t enjoy a good bar fight? http://aozhhbukjdy.com [url=http://vpjeuddhnpf.com]vpjeuddhnpf[/url] [link=http://fnhoxe.com]fnhoxe[/link] commented:

This story is hilarious, mostly because it&#8217;s so realistic. The guy pushing up on the girlfriend is very relatable since I&#8217;ve had similar situations happen to me and people I know. Some people only understand a good ass-kicking! Also: who do1se&#82n7;t enjoy a good bar fight? http:aozhhbukjdy.com [url=http:vpjeuddhnpf.com]vpjeuddhnpf[/url] [link=http://fnhoxe.com]fnhoxe[/link]

on 14 May 2016, 06:44 UTC

on 14 May 2016, 10:46 UTC

on 14 May 2016, 19:06 UTC

on 14 May 2016, 21:31 UTC

AKuzRb http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

AKuzRb http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 08 Aug 2016, 10:25 UTC

51gUTD http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

51gUTD http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 10 Aug 2016, 18:56 UTC

bVfm8F http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

bVfm8F http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 14 Aug 2016, 03:58 UTC

OefaMX http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

OefaMX http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 14 Aug 2016, 05:32 UTC

ePdgzw http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

ePdgzw http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 14 Aug 2016, 08:48 UTC

cYyvUk http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com commented:

cYyvUk http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com

on 14 Aug 2016, 09:26 UTC

on 15 Aug 2016, 08:55 UTC

on 15 Aug 2016, 09:12 UTC

on 15 Aug 2016, 09:20 UTC

on 15 Aug 2016, 09:22 UTC

on 15 Aug 2016, 09:35 UTC

on 15 Aug 2016, 09:47 UTC

on 15 Aug 2016, 09:59 UTC

on 15 Aug 2016, 10:08 UTC

on 15 Aug 2016, 10:24 UTC

on 15 Aug 2016, 11:19 UTC

on 15 Aug 2016, 11:43 UTC

on 15 Aug 2016, 11:44 UTC

on 15 Aug 2016, 12:07 UTC