for loops in c++

**Ginsengelf** · 12th January 2009, 09:10

Hi, I think it is because the compiler needs to perform an additional calculation (substraction) to calculate "i < size" and then test for negative value while he can just test for zero in the second case.

Ginsengelf

**tinsuke** · 12th January 2009, 12:14

AFAIK, Ginsengelf is right. The better performance of a descending for is related to the low-level (compare, increase, decrease) operations done at each iteration.

I this this kind of descending for is faster than yours, this is very used at J2ME (cellphones, where ALL KIND of optimization (used to be) is pretty welcome):

Qt Code:

Switch view

int size = 100000;
int array[size ];
 
for (int i = size; --i >= 0;)
{
   // do something
}

int size = 100000;
int array[size ];
 
for (int i = size; --i >= 0;)
{
   // do something
}

To copy to clipboard, switch view to plain text mode

tinsukE

**wysota** · 13th January 2009, 17:08

In my opinion "//do something" takes longer than those comparisons, so it's not worth wasting time on optimizing for() when you can use it to optimize its contents. Also bear in mind that despite you write some statement in your code, the compiler makes optimizations so the final binary code might look different. After all it all goes down to using BNE or similar so as long as its argument is constant (needs not to be recalculated with each iteration) there is no difference in execution time.

**stinos** · 17th January 2009, 11:12

second that. It will be very hard to find a case where //do something is less significant then the loop logic; moreover, opimizers are very smart so I doubt that at cycle level there will be much difference between all possible forms of a for loop.
what might be worth considering though, is using pre-increment (++i) vs post-increment (i++) for non-builtin types. A typical post-increment operator for an object will create a copy of the object, then increment it the copy, and then return the copy. Pre-increment just increments the object and retuns a reference to it.

**seneca** · 17th January 2009, 14:42

The increment/decrement may only give acedemic differences, however there is a true core of the claim for such constructs:

Qt Code:

Switch view

for (int x = obj.count()-1; x >= 0; x--) doSomething(x); // faster
for (int x = 0; x < obj.count(); x++) doSomething(x); // slower

for (int x = obj.count()-1; x >= 0; x--) doSomething(x); // faster
for (int x = 0; x < obj.count(); x++) doSomething(x); // slower

To copy to clipboard, switch view to plain text mode

The method obj.count() will only get called once in the decrementing loop.

**stinos** · 17th January 2009, 14:50

sure, but there's a rule of thumb stating loop invariant stuff should be placed outside the loop, then the problem does not exist anymore:

Qt Code:

Switch view

const int iCount = obj.count();
for( int i = 0 ; i < iCount ; ++i ) doSomething();       //the same..
for( int i = iCount - 1 ; i >= 0 ; --i ) doSomething(); //

const int iCount = obj.count();
for( int i = 0 ; i < iCount ; ++i ) doSomething();       //the same..
for( int i = iCount - 1 ; i >= 0 ; --i ) doSomething(); //

To copy to clipboard, switch view to plain text mode

I also think that when 'obj' is not modified in the loop, any decent optimizer will spot that chance and automaitcally create the code above, ie creating a local variable for the count.

**wysota** · 17th January 2009, 17:26

I still say doSomething() has more meaning, even if it is an empty function call - creating a stack frame is more expensive than reading a variable or comparing it to some other variable. If you want to make such statements, at least paste a profiler log or a compiler output to prove it

**stinos** · 17th January 2009, 21:56

Originally Posted by wysota

creating a stack frame is more expensive than reading a variable or comparing it to some other variable.

that's certainly true; as I said I do not think descending/ascending will matter, but the pre/post-increment does.

If you want to make such statements, at least paste a profiler log or a compiler output to prove it

as you insist

here's a quick test, no optimisations used, post-increment standard for-loop vs pre-increment with variable for vector end:

Qt Code:

Switch view

void doSomething( unsigned )
{
}
 
void test( const size_t ac_nTestSize )
{
  std::vector< unsigned > x( ac_nTestSize );
  std::vector< unsigned > y( ac_nTestSize );
 
  time::QueryPerformanceTimer tim;
 
  tim.mp_Start();
 
  std::vector< unsigned >::const_iterator cit;
  for( cit = x.begin() ; cit < x.end() ; cit++ )
    doSomething( *cit );
 
  s_dbgpf( "post-inc took: %.3f", tim.mf_dElapsedMilliSeconds() );
 
  tim.mp_Start();
 
  std::vector< unsigned >::const_iterator cit2;
  const std::vector< unsigned >::const_iterator citEnd( y.end() );
  for( cit2 = y.begin() ; cit2 < citEnd ; ++cit2 )
    doSomething( *cit2 );
 
  s_dbgpf( "pre-inc took: %.3f", tim.mf_dElapsedMilliSeconds() );
}

void doSomething( unsigned )
{
}

void test( const size_t ac_nTestSize )
{
  std::vector< unsigned > x( ac_nTestSize );
  std::vector< unsigned > y( ac_nTestSize );

  time::QueryPerformanceTimer tim;

  tim.mp_Start();

  std::vector< unsigned >::const_iterator cit;
  for( cit = x.begin() ; cit < x.end() ; cit++ )
    doSomething( *cit );

  s_dbgpf( "post-inc took: %.3f", tim.mf_dElapsedMilliSeconds() );

  tim.mp_Start();

  std::vector< unsigned >::const_iterator cit2;
  const std::vector< unsigned >::const_iterator citEnd( y.end() );
  for( cit2 = y.begin() ; cit2 < citEnd ; ++cit2 )
    doSomething( *cit2 );

  s_dbgpf( "pre-inc took: %.3f", tim.mf_dElapsedMilliSeconds() );
}

To copy to clipboard, switch view to plain text mode

output for different sizes:

10:
post-inc took: 0.040
pre-inc took: 0.009

100:
post-inc took: 0.204
pre-inc took: 0.013

1000:
post-inc took: 2.823
pre-inc took: 0.113

10000:
post-inc took: 20.146
pre-inc took: 0.712

I have to say I really had no idea, until just now, that it makes such a difference..

**wysota** · 17th January 2009, 22:16

These calculations are not representative. You should measure the time used by the process, not the time elapsed. Another thing is that the increased delay comes from both recalculation of end() and post-increment operator. Surely iterating 20k elements takes less than 20 seconds

Anyway that all is nothing compared to the contents of doSomething().

**stinos** · 17th January 2009, 23:30

well I'm nut sure whether or not this is representative; I ran each test a couple of times and each time the result was more or less the same (times in mSec btw). Also, the first case has a lot more instructions.
Can't I conclude then, that for these particular pieces of code, the second one is noticably faster than the first one, though they both yield the same result? Note that this is just about the pre-increment vs post-increment, completely ignoring the importance of doSomething().

how do I measure the process time btw?

**wysota** · 18th January 2009, 01:32

Originally Posted by stinos

well I'm nut sure whether or not this is representative; I ran each test a couple of times and each time the result was more or less the same (times in mSec btw).

The fact that you can reproduce it doesn't mean it's representative.

Can't I conclude then, that for these particular pieces of code, the second one is noticably faster than the first one, though they both yield the same result?

Sure, it's faster. Noticably faster - I'm not sure. Biased to prove the point - definitely so

It is true that post-increment is in most cases slower than pre-increment, but the for() loop has nothing to do with it, so let's not exagerrate it - we mostly iterate fors using integers.

how do I measure the process time btw?

Using clock(), times() or "time" shell command. Or a profiler.

**stinos** · 18th January 2009, 14:39

Originally Posted by wysota

The fact that you can reproduce it doesn't mean it's representative.

mmm, good point

Biased to prove the point - definitely so

yeah I know, I'm a bit too keen on details sometimes..

Using clock(), times() or "time" shell command. Or a profiler.

I'm afraid I don't understand this: what's the difference between using QueryPerformanceCounter() or rdtsc() and using eg clock()? Each returns a timer value, possibly from different timers, but how else then measurig a start and stop time and subtracting the two can one mesure execution time? Doesn't a profiler work like that?
Suppose I use time() on the shell, wouldn't time at process stop - time at process start be equal to the two time values I measure inside the test() function + time needed for program init etc?

@TS: sorry for the off-topic discussion, but on the other hand, we'll probably learn from it, seems to me wysota is rather experienced!