Does using too many () in C++ slows down the application?

Recently a friend of mine suggested me that your program may if we using too many parentheses or (). I thought it will be an interesting blog post featuring what’s happening behind the scenes when we use () in C++ and whether it’s a myth or reality ?



Note –Parentheses or () are used for many purposes in C/C++ and other languages but Scope of discussion in this post is only limited to use of parentheses in arithmetic expressions to slice them in sub expressions. Also am going to use () rather than typing whole word ‘parentheses from now on’.
Also am posting this only because I found it interesting to do some research on the topic and felt would be a good and knowledgeable thing to share.

So what was suggested by giving an example is , if we are given two statements –

  • A : (((a+b)+c)+d)
  • B : (a+b)+(c+d)

B will be faster than A and if we run both of them like 1000000+ times we will see than B is significantly faster than A.

Question 1:What do you think?

  1. B is faster than A
  2. B and A both are same

Question 2: Using many () affect the runtime performance?

  1. Yes (means B is faster than A)
  2. No (means either of 2 or 3 in above question)

Also, what’s the reason behind whatever option you think is correct?

I selected Ques1 – option 2 and Ques2 – option 2 before actually performing some real experiment.Now let’s see what really happens behind the scene and find the right answer and reason behind it. Also if I was right or wrong.

What I think is () are just use to recognize different sub expressions in a complex expression at compiling stage. Means () tells the compiler that all the stuff inside () should be treated as a single entity and should be evaluated first before applying any operator in parent expression. Compilers then use this order to generate machine level instructions to actually perform those ordered operations at runtime.  So  there is no effect of () on runtime execution.

e.g. a = b*(c+d); here () tells compiler that it should treat (c+d) as a single entity in parent expression and value of c+ d should be multiplied by b. If we don’t use () and the expression is a = b*c+d; then it will be evaluated as (b*c) + d. (This comes from operator precedence).

So (((a+b)+c)+d) is same as (a+b)+(c+d) so option2 for 1st question and option 2 for 2nd question. () will used only at compile time to recognize sub expressions and assembly code for both statements will be same. So that was my thinking/reasoning behind my answers.

Now to verify this I written code in VC++ 2010 and checked its assembly code that was generated, for both the expressions :-

A : r = (((a+b)+c)+d);
00D313C1 mov eax,dword ptr [a]
00D313C4 add eax,dword ptr [b]
00D313C7 add eax,dword ptr [c]
00D313CA add eax,dword ptr [d]
00D313CD mov dword ptr [r],eax

B : r = (a+b)+(c+d);
00D313D0 mov eax,dword ptr [a]
00D313D3 add eax,dword ptr [b]
00D313D6 mov ecx,dword ptr [c]
00D313D9 add ecx,dword ptr [d]
00D313DC add eax,ecx
00D313DE mov dword ptr [r],eax

C : r = a + b + c + d;
00E213E1 mov eax,dword ptr [a]
00E213E4 add eax,dword ptr [b]
00E213E7 add eax,dword ptr [c]
00E213EA add eax,dword ptr [d]
00E213ED mov dword ptr [r],eax

Let’s analyze what’s really happening in both of them.

  • In case of A: r = (((a+b)+c)+d);      –        mov,add,add,add,mov
  • In case of B: r = (a+b)+(c+d);         –        mov,add, mov, add,add,mov
  • In case of C: r = a + b + c + d           –        mov,add,add,add,mov

So as you can see there’s an extra mov operation in B which was supposed to be faster but it got extra mov instruction which tells us that it may be slower than A by time required to process one extra mov instruction (if do takes noticeable time).

Also assembly code for a + b + c + d  is same as that for A and it confirms that there is no presence of () in runtime code generated so it doesn’t affect the performance. Hence case (((a+b)+c)+d) , (a+b)+(c+d) and a+ b + c + d   should take same time at execution.

That been said sometimes improper use of () can generate some extra instructions like in B where it generates an extra mov instruction but whether it affects the performance in some noticeable way that have been left for reader as an exercise, I would love to hear if anyone finds something interesting there also.

So what I would like to suggest is we should use as many () as possible in complex expressions to clearly identify/separate sub expressions. Yes some people may say we can just remember BODMAS rule but expressions may contain a lot more operators than +,-,/,* , so again one may go and say we can learn this . But again let me ask you how many of you remember it ?

Am not saying we shouldn’t know the precedence and order of evaluation, what am saying is no matter how much experienced you are if you read a complex expression without () you will pause for sure to figure out order of execution of operations inside any particular complex expression. Also in office environments working fast and under pressure or maybe for any reason we may write something thinking it should work ignoring all those things mentioned above in a hurry , whereas in reality things will be different and may introduce bugs in your program.

Like you may write something like –

a = b+c*d%e when you actually meant a = b+(c*(d%e))

So to avoid any such troubles we must use () specially in expressions involving more than one operators to improve readability and avoid unnecessary mistakes and introduce bugs.

Advertisements

About chetanjags

Game Developer

Posted on March 7, 2012, in C++, Programming and tagged , . Bookmark the permalink. 7 Comments.

  1. Well, if we consider execution times, I tested all three options above and it took A > B > C with really slight differences. And in -O2 level of optimization there are no time differences can be seen.

    I tested in 1000000000 loop, each of the formulas and multiplied each variable with cos(i) forcing compiler to compute every iteration.

    As for the compiler-generated assembly files; obviously number of instructions would depend on the architecture that you compile your program on. I do not know every architecture in the market but Intel processors tend to have sort of an accumulator register -“eax” and “ecx” in your case. These accumulator registers are the only destination register for arithmetic instructions. Therefore (a+b)+(c+d) takes one more instruction to complete. In options A and C you can put your first variable into accumulator and let b,c and d “accumulate” on a.

    On the other hand, another architecture that I know MIPS doesn’t have any accumulator registers, which allows destination registers to be any register in register file. In such an example code for all of the three options would require same instruction sequence which is basically “add add add”.

    I hope my addition helps.

  2. @ekayrakli
    Thanks , great info there on the topic.

    Well i tested for A and B only that too in debug mode and yea i also got A>B by slight difference. And yea i also thought the same about A its like place first var on stack and just keep throwing others on top of it 1 by1 whereas in b its like calculating 2 results at different location move to one of them and add the other.
    I think i will check all 3 again in release mode with optimization and wil ltry to figure out why C is lagging behind because i thought it would be same as A since same assembly code being generated ?
    Can you please post the assembly code being generated in your case ?

  3. Ok i did those tests and i did 100000000 times each and repeated the whole test 5-6 times I am getting A is faster than C faster than B in Release mode and yea with O2 level optimisation turned on but differences are very low.

  4. First of all here is my GAS code for three different formulas:

    movl -16(%rbp), %eax
    movl -20(%rbp), %edx
    addl %edx, %eax
    addl -12(%rbp), %eax
    addl -8(%rbp), %eax
    movl %eax, -4(%rbp)

    movl -16(%rbp), %eax
    movl -20(%rbp), %edx
    leal (%rdx,%rax), %ecx
    movl -8(%rbp), %eax
    movl -12(%rbp), %edx
    addl %edx, %eax
    addl %ecx, %eax
    movl %eax, -4(%rbp)

    movl -16(%rbp), %eax
    movl -20(%rbp), %edx
    addl %edx, %eax
    addl -12(%rbp), %eax
    addl -8(%rbp), %eax
    movl %eax, -4(%rbp)

    There is one important thing to note though: You cannot assume that if two code segments issue same instructions, they must complete in the same amount of time. Consider the assembly code you give(mine seems “compiler-touched” a little bit)

    In A:

    00D313C1 mov eax,dword ptr [a]
    00D313C4 add eax,dword ptr [b]
    00D313C7 add eax,dword ptr [c]
    00D313CA add eax,dword ptr [d]
    00D313CD mov dword ptr [r],eax

    instructions at C4, C7 and CA are all dependent on their previous instructions. ie You need to finish “mov eax,… ” first to be able to do “add eax,…”. I think there is no need for explanation.

    Where as in B:

    00D313D0 mov eax,dword ptr [a]
    00D313D3 add eax,dword ptr [b]
    00D313D6 mov ecx,dword ptr [c]
    00D313D9 add ecx,dword ptr [d]
    00D313DC add eax,ecx
    00D313DE mov dword ptr [r],eax

    D3 is dependent on D0 and D9 is dependent on D6, however both of these instruction couples can run in parallel. I do not know how much you are familiar with computer architecture concepts but, today’s processors(even most of the processors of uni-core era) has superscalar pipelines. You can search the term and find myriad of sources on how exactly those work. But simply put: even though there is not explicitly parallel code, a processor can issue multiple instructions at the same time if it decides(in hardware) there is no dependency.

    So, even if you have a single core CPU(hoping that you bought it in the last 20 years), it can execute D6-D0 in one cycle and then D3-D9 in the next one. Which cannot be done in even if you have multicore machine, for the code in A.

    Sorry, if the answer is too detailed 🙂

  5. a very well written conclusion, prefer readability over “very minor performance gains.”

    • well said manish but in this case it’s like prefer readbility over “SAME PERFORMENCE”
      Becasuse no () doesnt affect performence in anyway.

      Disscussion related to assembly code is a whole different scenario in which some spefic ordering of sub expressions might provide some performence boost but again it doesn’t matter if you build your program with optimisation turned on.

  6. oh great i really missed that thing there. I didn’t went into that deep.
    This is just getting better and better. Ofcourse that should be the case.
    Also gona run some more tests maybe on some different architechtures (maybe on mobo too).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: