Hi, I think it is because the compiler needs to perform an additional calculation (substraction) to calculate "i < size" and then test for negative value while he can just test for zero in the second case.
Ginsengelf
Hi, I think it is because the compiler needs to perform an additional calculation (substraction) to calculate "i < size" and then test for negative value while he can just test for zero in the second case.
Ginsengelf
AFAIK, Ginsengelf is right. The better performance of a descending for is related to the low-level (compare, increase, decrease) operations done at each iteration.
I this this kind of descending for is faster than yours, this is very used at J2ME (cellphones, where ALL KIND of optimization (used to be) is pretty welcome):
Qt Code:
int size = 100000; int array[size ]; for (int i = size; --i >= 0;) { // do something }To copy to clipboard, switch view to plain text mode
tinsukE
In my opinion "//do something" takes longer than those comparisons, so it's not worth wasting time on optimizing for() when you can use it to optimize its contents. Also bear in mind that despite you write some statement in your code, the compiler makes optimizations so the final binary code might look different. After all it all goes down to using BNE or similar so as long as its argument is constant (needs not to be recalculated with each iteration) there is no difference in execution time.
second that. It will be very hard to find a case where //do something is less significant then the loop logic; moreover, opimizers are very smart so I doubt that at cycle level there will be much difference between all possible forms of a for loop.
what might be worth considering though, is using pre-increment (++i) vs post-increment (i++) for non-builtin types. A typical post-increment operator for an object will create a copy of the object, then increment it the copy, and then return the copy. Pre-increment just increments the object and retuns a reference to it.
The increment/decrement may only give acedemic differences, however there is a true core of the claim for such constructs:
Qt Code:
for (int x = obj.count()-1; x >= 0; x--) doSomething(x); // faster for (int x = 0; x < obj.count(); x++) doSomething(x); // slowerTo copy to clipboard, switch view to plain text mode
The method obj.count() will only get called once in the decrementing loop.
sure, but there's a rule of thumb stating loop invariant stuff should be placed outside the loop, then the problem does not exist anymore:
Qt Code:
const int iCount = obj.count(); for( int i = 0 ; i < iCount ; ++i ) doSomething(); //the same.. for( int i = iCount - 1 ; i >= 0 ; --i ) doSomething(); //To copy to clipboard, switch view to plain text mode
I also think that when 'obj' is not modified in the loop, any decent optimizer will spot that chance and automaitcally create the code above, ie creating a local variable for the count.
I still say doSomething() has more meaning, even if it is an empty function call - creating a stack frame is more expensive than reading a variable or comparing it to some other variable. If you want to make such statements, at least paste a profiler log or a compiler output to prove it![]()
that's certainly true; as I said I do not think descending/ascending will matter, but the pre/post-increment does.
as you insistIf you want to make such statements, at least paste a profiler log or a compiler output to prove it![]()
here's a quick test, no optimisations used, post-increment standard for-loop vs pre-increment with variable for vector end:
Qt Code:
void doSomething( unsigned ) { } void test( const size_t ac_nTestSize ) { std::vector< unsigned > x( ac_nTestSize ); std::vector< unsigned > y( ac_nTestSize ); time::QueryPerformanceTimer tim; tim.mp_Start(); std::vector< unsigned >::const_iterator cit; for( cit = x.begin() ; cit < x.end() ; cit++ ) doSomething( *cit ); s_dbgpf( "post-inc took: %.3f", tim.mf_dElapsedMilliSeconds() ); tim.mp_Start(); std::vector< unsigned >::const_iterator cit2; const std::vector< unsigned >::const_iterator citEnd( y.end() ); for( cit2 = y.begin() ; cit2 < citEnd ; ++cit2 ) doSomething( *cit2 ); s_dbgpf( "pre-inc took: %.3f", tim.mf_dElapsedMilliSeconds() ); }To copy to clipboard, switch view to plain text mode
output for different sizes:
10:
post-inc took: 0.040
pre-inc took: 0.009
100:
post-inc took: 0.204
pre-inc took: 0.013
1000:
post-inc took: 2.823
pre-inc took: 0.113
10000:
post-inc took: 20.146
pre-inc took: 0.712
I have to say I really had no idea, until just now, that it makes such a difference..
These calculations are not representative. You should measure the time used by the process, not the time elapsed. Another thing is that the increased delay comes from both recalculation of end() and post-increment operator. Surely iterating 20k elements takes less than 20 seconds
Anyway that all is nothing compared to the contents of doSomething().
well I'm nut sure whether or not this is representative; I ran each test a couple of times and each time the result was more or less the same (times in mSec btw). Also, the first case has a lot more instructions.
Can't I conclude then, that for these particular pieces of code, the second one is noticably faster than the first one, though they both yield the same result? Note that this is just about the pre-increment vs post-increment, completely ignoring the importance of doSomething().
how do I measure the process time btw?
The fact that you can reproduce it doesn't mean it's representative.
Sure, it's faster. Noticably faster - I'm not sure. Biased to prove the point - definitely soCan't I conclude then, that for these particular pieces of code, the second one is noticably faster than the first one, though they both yield the same result?
It is true that post-increment is in most cases slower than pre-increment, but the for() loop has nothing to do with it, so let's not exagerrate it - we mostly iterate fors using integers.
Using clock(), times() or "time" shell command. Or a profiler.how do I measure the process time btw?
mmm, good point
yeah I know, I'm a bit too keen on details sometimes..Biased to prove the point - definitely so![]()
I'm afraid I don't understand this: what's the difference between using QueryPerformanceCounter() or rdtsc() and using eg clock()? Each returns a timer value, possibly from different timers, but how else then measurig a start and stop time and subtracting the two can one mesure execution time? Doesn't a profiler work like that?Using clock(), times() or "time" shell command. Or a profiler.
Suppose I use time() on the shell, wouldn't time at process stop - time at process start be equal to the two time values I measure inside the test() function + time needed for program init etc?
@TS: sorry for the off-topic discussion, but on the other hand, we'll probably learn from it, seems to me wysota is rather experienced!
Bookmarks