473,323 Members | 1,550 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,323 software developers and data experts.

more hand written integer pow() functions (LONG POST)

/* hello, I have some fairly naive queries here related to optimising code!
I know the first answer is 'don't' but leave that to one side for the
moment.

1) I'm looking for constructive comments on the mul_a() and pow_a() below
comments on style/clarity/portability/obvious efficiency issues are welcome
- any better ways to write them without changing the algorithm.

2) Comments on mul_b() and pow_b() below which attempt to optimise them.

3) any better algorithms - code examples preferred,
- I don't have access to knuth at the moment!

4) I have assumed x and y are non negative integers. If x were allowed to be
negative, should be no portability issues as there are no right shifts on x,
right?

Obviously for the multiply routines only, assume there isn't an in-built
multiply
but you can double, halve and use mod (%) with a 2. A bit artificial perhaps
but
I'm trying to understand when and where you might use shifts and ANDs, etc!

Also assume pow(0,0)=1.

Thanks very much in advance
---
BTW its not a homework problem, if it were I'd go for a variation of

double ipow(double x, int n)
{
return (!n)?1:(n<0)?ipow(1/x,-n):(n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
}

from Mark Stephen on comp.lang.c.moderated a few years back!

which would give:

int kpow(int x, int n)
{
return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
}

Presumably with tail recursion this version is quite efficient?
*/

/*--------------------start of code---------------*/
/*Tested using Borland C++ 5.6.*/

#include <stdio.h>
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* basic versions */
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant multiply */
int mul_a(int x, int y)
{
int t=0;

for(;;)
{
if (y%2) t+=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=2; /* double x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant power */
int pow_a(int x, int y)
{
int t=1;

for(;;)
{
if (y%2) t*=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* better(?) versions */
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant multiply - attempt to optimise */
int mul_b(int x, int y)
{
int t=0;

for(;;)
{
if (y&1) t+=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x<<=1; /* double x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/* peasant power - attempt to optimise */
int pow_b(int x, int y)
{
int t=1;

for(;;)
{
if (y&1) t*=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
void main() {
int n;

putchar('\n');

for( n =0; n<=10; n++)
printf("%5d, %5d, %5d, %5d\n", n, n<<1, n>>1, n&1);

putchar('\n');

for( n =0; n<=5; n++)
printf("%5d, %5d, %5d\n", n, mul_a(n,n), pow_a(n,n));

putchar('\n');

for( n =0; n<=5; n++)
printf("%5d, %5d, %5d\n", n, mul_b(n,n), pow_b(n,n));

putchar('\n');
}
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
/*--------------------end of code-----------------*/

Nov 13 '05 #1
8 2945
Greetings.

In article <3f******@news.greennet.net>, silly wrote:
comments on style/clarity/portability/obvious efficiency issues are
welcome void main() { .... putchar('\n');
}


For a start, the definition should be int main(void), and the function must
return a value (e.g., return(EXIT_SUCCESS);).

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #2
On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
hello, I have some fairly naive queries here related to optimising code!
I'm going to pay attention to the pow() functions and ignore the mul()
functions.
/* peasant power */
int pow_a(int x, int y)
{
int t=1;

for(;;)
{
if (y%2) t*=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}

/* peasant power - attempt to optimise */
int pow_b(int x, int y)
{
int t=1;

for(;;)
{
if (y&1) t*=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}
This code is the C equivalent of some IBM OS/360 code that Glen
Herrmannsfeldt posted here last week.

PLUS LD FACTOR,ONE LOAD FACTOR OF ONE IN FACTOR REG
LOOP SRDL EXPN,1 SHIFT LOW BIT EXPN REG INTO ADDR REG
LTR ADDR,ADDR TEST SIGN POS ADDR REG FOR MINUS BIT
BC 10,JUMP IF SIGN BIT NOT MINUS,BRANCH TO JUMP
MDR FACTOR,BASE MULTIPLY FACTOR REG BY BASE NO REG
JUMP LTR EXPN,EXPN CHECK IF EXPONENT PLUS,MINUS,OR ZERO
BC 8,NEXT IF EXPONENT NOW ZERO, BRANCH TO NEXT
MDR BASE,BASE MULTIPLY BASE NO BY DOUBLING ITSELF
BC 15,LOOP BRANCH TO LOOP TO TEST NEXT EXPN BIT

In any case, you haven't done much optimization here. You have
simply replaced arithmetic operators with equivalent (assuming
positive values) bitwise operators. This sort of optimization is
usually done by the compiler anyway and there's not much point
worrying about it yourself.

In this case, the replacement of y/=2 with y>>=1 really does make
the function faster. Because the compiler can't assume that y > 0,
it has to generate code for y/=2 that works properly for negative
numbers, which adds a few instructions in the loop.

If you add a check for negative y at the beginning of both
functions:

if (y < 0) return 0;

then the compiler has enough information to optimize the y/=2 into
the exact same code as y>>=1, theoretically at least. In practice
the compiler I'm using doesn't perform the optimization even with
that extra information.

It is worthwhile to put the check in anyway so that the function
doesn't return the wrong answer for negative exponents.

I do have an optimization for this code. It's not guaranteed to
make the code faster, but it does make it faster on my machine.
BTW its not a homework problem, if it were I'd go for a variation of

double ipow(double x, int n)
{
return (!n)?1:(n<0)?ipow(1/x,-n):(n&1)?x*ipow(x,n-1):ipow(x*x,n/2);
}

from Mark Stephen on comp.lang.c.moderated a few years back!
which would give:

int kpow(int x, int n)
{
return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
}

Presumably with tail recursion this version is quite efficient?


You can't rely on tail recursion being optimized in C compilers,
although it's not uncommon for there to be a switch that enables
it.

Did you try timing kpow() and your functions and see what is
faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
functions are twice as fast as kpow(). In fact, pow_b() is
slightly faster on my system than James Hu's approach that
unrolled the inner loop.

I came up with code pretty much exactly like your pow_b() a few
days ago by translating the OS/360 assembler above into C. One
small change then made it a bit faster. You'll be able to see the
change easily in the code below.

Here are several different functions and their execution times.
The main() used for testing was:

static volatile int dest;
int main (void)
{
int y;

for (y = 0; y < 16; ++y)
{
int x = 10000000; /* ten million */
while (--x)
dest = powfunction(3,y);
}

return 0;
}

The testing program, including the various pow functions was
compiled with the following command line in gcc 3.3.1:

$ gcc -Wall -W -O2 -fomit-frame-pointer -march=athlon -o ipows ipows.c -lm

Now for the various functions and the results, starting with the slowest
and moving to the fastest. The program was compiled replacing "powfunction"
with each of the different contenders and then was run 3 times. The
best time of the 3 trials is given below:

1) Compact recursive version (ala Mark Stephen)

static int kpow(int x, int n)
{
if (n < 0) return 0;
return (!n)?1:(n&1)?x*kpow(x,n-1):kpow(x*x,n/2);
}

$ time ./ipows

real 0m7.403s
user 0m6.630s
sys 0m0.030s

2) Your pow_a()

static int pow_a(int x, int y)
{
int t=1;

if (y < 0) return 0;
for(;;)
{
if (y%2) t*=x; /* if y is odd... */
if (!(y/=2))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}

$ time ./ipows

real 0m4.395s
user 0m3.920s
sys 0m0.020s

3) Unrolled version (ala James Hu)

static signed char hbit[32] = {
-1,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
};

static int hupow (int x, int n)
{
int t = 1;

if (n < 0) return 0;
if (n == 0) return 1;
if (n == 1) return x;
if (x == 0) return 0;

switch (hbit[n]) {
case 4: if (n & 1) t *= x; n >>= 1; x *= x;
case 3: if (n & 1) t *= x; n >>= 1; x *= x;
case 2: if (n & 1) t *= x; n >>= 1; x *= x;
case 1: if (n & 1) t *= x; n >>= 1; x *= x;
default: t *= x;
}
return t;
}

$ time ./ipows

real 0m3.332s
user 0m2.990s
sys 0m0.010s

4) Your pow_b()

static int pow_b(int x, int y)
{
int t=1;

if (y < 0) return 0;
for(;;)
{
if (y&1) t*=x; /* if y is odd... */
if (!(y>>=1))break; /* halve y and if result is zero... */
x*=x; /* square x */
}

return t;
}

$ time ./ipows

real 0m3.208s
user 0m2.870s
sys 0m0.020s

5) My translation of the OS/360 code

static int ibmpow (int x, int y)
{
int factor = 1;

if (y < 0) return 0;

if (y & 1) factor *= x;
y >>= 1;

while (y)
{
x *= x;
if (y & 1) factor *= x;
y >>= 1;
}

return factor;
}

$ time ./ipows

real 0m2.845s
user 0m2.480s
sys 0m0.080s

-Sheldon

Nov 13 '05 #3
Sheldon Simms <sh**********@yahoo.com> wrote in message news:<pa****************************@yahoo.com>...
On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
Did you try timing kpow() and your functions and see what is
faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
functions are twice as fast as kpow(). In fact, pow_b() is
slightly faster on my system than James Hu's approach that
unrolled the inner loop.
kpow() is not really tail recursion, so there is no way for
the compiler to optimize it into a loop.

My unrolled implementation would probably need a computed
goto for the array to improve the time. But realize that
the version you used as the basis for you experiment was
trying to avoid implementation defined behavior.
I came up with code pretty much exactly like your pow_b() a few
days ago by translating the OS/360 assembler above into C. One
small change then made it a bit faster. You'll be able to see the
change easily in the code below.
It was unclear to me why your small change should make any
difference. Do you have a theory?

Please compare your routine with the routine below based on the
refinement suggested by Tim Woodall and the implementation by
Eric Sosman:

int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) {
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}
5) My translation of the OS/360 code

static int ibmpow (int x, int y)
{
int factor = 1;

if (y < 0) return 0;

if (y & 1) factor *= x;
y >>= 1;

while (y)
{
x *= x;
if (y & 1) factor *= x;
y >>= 1;
}

return factor;
}


The tail recursive form for this would be:

static int r_ibmpow2(int x, int y, int factor)
{
if (y == 0) return factor;
x *= x;
if (y & 1) factor *= x;
return r_ibmpow2(x, y >> 1, factor);
}

static int ibmpow2(int x, int y)
{
if (y < 0) return 0;
if (y & 1) return r_ibmpow2(x, y, x);
return r_ibmpow2(x, y, 1);
}

But, you would have to use -O3 to get gcc to remove the tail recursion.

-- James
Nov 13 '05 #4

"James Hu" <jx*@despammed.com> wrote in message
news:b3**************************@posting.google.c om...
Sheldon Simms <sh**********@yahoo.com> wrote in message

news:<pa****************************@yahoo.com>...
On Sun, 02 Nov 2003 13:14:18 +0000, silly wrote:
Did you try timing kpow() and your functions and see what is
faster? On my system (Athlon XP 1800+, gcc 3.3.1) your
functions are twice as fast as kpow(). In fact, pow_b() is
slightly faster on my system than James Hu's approach that
unrolled the inner loop.


kpow() is not really tail recursion, so there is no way for
the compiler to optimize it into a loop.


Just inserting my two-bits...

You guys may also want to try a sliding window approach. You're using a
binary exponentiation which is fast but sliding windows are cooler. :-)

Tom
Nov 13 '05 #5
On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
Sheldon Simms <sh**********@yahoo.com> wrote in message news:<pa****************************@yahoo.com>...

My unrolled implementation would probably need a computed
goto for the array to improve the time. But realize that
the version you used as the basis for you experiment was
trying to avoid implementation defined behavior.
I came up with code pretty much exactly like your pow_b() a few
days ago by translating the OS/360 assembler above into C. One
small change then made it a bit faster. You'll be able to see the
change easily in the code below.
It was unclear to me why your small change should make any
difference. Do you have a theory?


I know why it's faster, but that comes from looking at the
assembly code generated. I actually only made the change to
avoid the infinite-loop-with-break construct that comes
out of the direct translation of the OS/360 code. Here's the
code again:

static int ibmpow (int x, int y)
{
int factor = 1;
if (y < 0) return 0;

if (y & 1) factor *= x;
y >>= 1;

while (y)
{
x *= x;
if (y & 1) factor *= x;
y >>= 1;
}

return factor;
}

One reason it is faster is that the generated code avoids the
first multiplication for odd y by replacing the if-statement
before the loop with this:

if (y & 1) factor = x;
y >>= 1;

But that's not everything. I tested by replacing the test loop
with

for (y = 1; y < 32; y <<= 1)

And the ibmpow() version is still faster than pow_b().

I can see that the generated code for the loop in ibmpow() is
a bit better than that for pow_b(). That's interesting since
the statements are the same, only rearranged.
Please compare your routine with the routine below based on the
refinement suggested by Tim Woodall and the implementation by
Eric Sosman:

int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) {
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}
I must have missed that one in the thread. After small changes so
that it conforms to the same interface as the others (signed n, check
for n < 0), it's the fastest so far:

$ time ./ipows

real 0m2.032s
user 0m1.980s
sys 0m0.000s
The tail recursive form for this would be: .... But, you would have to use -O3 to get gcc to remove the tail recursion.


I avoided O3 because it inlined the power function, sometimes.

-Sheldon

Nov 13 '05 #6
On 2003-11-03, Sheldon Simms <sh**********@yahoo.com> wrote:
On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) {
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}


Doh! That's what I get for composing from memory. The first while loop
needs to test for a leading 0 bit. The original by Eric was right:

while (!(n&1)) {
...

But I guess you can compare that with:

while ((n&1)^1) {
...

-- James
Nov 13 '05 #7
jx*@despammed.com (James Hu) wrote in message >
int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) { ^^^^^
What is this? Did I really write something like that? Apart from the
fact that it is wrong, I HATE clever tricks like that even when they
are right! Using xor to perform an XOR is fine, using it to test a bit
is HORRIBLE! (YMMV :-)
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}


Tim.
Nov 13 '05 #8
On Mon, 03 Nov 2003 11:31:21 -0600, James Hu wrote:
On 2003-11-03, Sheldon Simms <sh**********@yahoo.com> wrote:
On Mon, 03 Nov 2003 02:40:05 -0800, James Hu wrote:
int tim_eric_pow(int x, unsigned n)
{
int t = 1;
if (n == 0) return 1;
while (n ^ 1) {
n >>= 1; x *= x;
}
t = x;
while (n >>= 1) {
x *= x;
if (n & 1) t *= x;
}

return t;
}

Doh! That's what I get for composing from memory. The first while loop
needs to test for a leading 0 bit. The original by Eric was right:


Here I am, a trusting soul, not bothering to make sure that the
function actually works. So anyway the time I gave earlier is wrong.
This (corrected) function is still the fastest, but just barely.
while (!(n&1)) {
...
$ time ./ipows

real 0m2.431s
user 0m2.400s
sys 0m0.010s

But I guess you can compare that with:

while ((n&1)^1) {
...


$ time ./ipows

real 0m2.435s
user 0m2.410s
sys 0m0.000s
Nov 13 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

303
by: mike420 | last post by:
In the context of LATEX, some Pythonista asked what the big successes of Lisp were. I think there were at least three *big* successes. a. orbitz.com web site uses Lisp for algorithms, etc. b....
22
by: bearophile | last post by:
Ville Vainio: >It's highly typical for the newbies to suggest improvements to the >language. They will usually learn that they are wrong, but the >discussion that ensues can be fruitfull anyway...
19
by: shanx__=|;- | last post by:
hi i need some help regarding use of very very long integer datatype in 'c'.. i need it to store result of large number's factorial.. if someone can healp it would be a delight..
21
by: utab | last post by:
Hi there, Is there a way to convert a double value to a string. I know that there is fcvt() but I think this function is not a part of the standard library. I want sth from the standard if...
42
by: yong | last post by:
Hi all I have an large integer in this format x1*256^5 + x2*256^4 + x3*256^3 + x4*256^2 + x5*256 + x6 now I must convert it to this format y1*900^4 + y2*900^3 + y3*900^2 + y4*900 + y5 x1-x5...
19
by: Robbie Hatley | last post by:
For your enjoyment, a function that expresses any integer with absolute value less-than-or-equal-to nine quintillion in any base from 2 to 36. (For larger bases, you could expand the "digit"...
8
by: Chris Stankevitz | last post by:
Q1: Does c++ provide pow(int,int)? Q2: If not, why not? Thanks, Chris
9
Ganon11
by: Ganon11 | last post by:
I was working on a program this morning that required me to use the pow function (to compute 2^i at one point, and 3^i at another). I used the following code: std::vector<int>...
18
by: Peng Yu | last post by:
Hi, I'm wondering if there is any general guideline on when to using something like std::pow(x, n) rather than x * x * x * ... * x (n x's). Thanks, Peng
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.