Re : A92: Re: optimization
[Prev][Next][Index][Thread]
Re : A92: Re: optimization
Dans un courrier daté du 29/08/98 19:14:36 , vous avez écrit :
>
> well...
>
> Some may disagree, but I think that the human brain has an advantage over
> optimizing compilers (high-level to low-level). Although, for example, the
I do disagree ... I do think we are not very efficient and that whatever the
code generated, the time spend regarded to the little speed gained is not
worth the work..
However, just for the fun, i do like to program in ASM...
> Microsoft C compiler has three ways to handle the "switch" statement.
> Depending on which produces either less code or faster code, that method is
> chosen. Unfortunately this is where the human can't compete. It takes
long
> enough to program one method.
>
> But, all of us programmers (I mean everyone...) know that there are times
> when we find these really "optimized" bits of code. I'm always proud of a
> section of code that most people can't understand without comments, because
> it manipulates information in an "odd" way such that the program size is
> reduced or speed is increased.
>
> Back to your question, optimization has two parts--size and speed. (For
> most cases) a program CANNOT be optimized for both speed and size. You can
> use ASM instructions that use few "processor instructions" so the program
> can run faster. But you usually will have MORE commands in the source
code,
> so the size of the program increases. If you choose to use ASM
instructions
> that "do more" for the size, you will be using more "processor
instructions"
> and hence slowing down the program.
My aim is to produce fast code: i do not really care aboutr the size..
>
> Us TI-calc programmers have a problem in that we want both size and speed
> optimization. (Since I only have a TI-92, I'll exclude other calcs... but
> you can "fill in the blanks") We only have about 64k of working space
(that
> has to be shared with everyone elses code). The 10MHz processor seems fast
> for everyone elses code, until we try to program that really awesome idea.
> (Anyone else seem to have this problem :)
>
>
> Here are a few tips for optimazations (PROJECT ALERT: Create a webpage with
> information for optimizing various types of code.)
yes: i will add a lesson on optimization in future releases of the 92guide...
> * Stack "clean-up"... I've seen several choices for fixing the stack
> pointer after function calls. The best I've seen is a simple "ADD.W ?,a7"
> where the ? is the byte offset. I'm pretty sure that the MOVE's and PEA's
> are slower than the ADD (considering that the addressing mode ?(a7) used
> with fixing the stack pointer probably uses the ADD).
For speed, it is not the best solution to use a function.. A macro would be
best, thus leaving no need for the stack clean up.
> * Fast multiplication/division... When you have a power of two (and
> actually this method can apply to any problem, but it takes some math
> understanding), you can use LSL to multiply and LSR to divide. The
standard
> instruction MULU 16,d0 would become LSL 4,d0. Likewise, DIVU 4,d0 would be
> LSR 2,d0. We all know that divides are MUCH slower than logical shifts.
Yes
> * Multiplication again... Well what if you have MULU 13,d0. Here is a
> possible solution:
> Math analysis--> What is 13x if put into the form (2^? +- 2^?
+-
> ...)x ?
> Solution-------> 13x = (2^4 - (2^1 +
> 2^0))x
>
> Old Code: New Code:
> MULU 13,d0 MOVE d0,d1 ; make a copy
> LSL 1,d0 ; *2 (d0 is now
> 2^1)
> ADD d0,d1 ; add (d1 is now
> 2^1 + 2^0)
> LSL 3,d0 ; *8 again (d0 is
> now 2^4)
> SUB d1,d0 ; sub (d0 is now
> 13x)
>
> But Why--------> Who cares! :) The latter code may run faster
> (haven't tried) if you are doing MANY 13x's. You won't notice the
> difference unless you run it for a couple thousand iterations. 13x is an
> uncommon multiply in assembler anyway. But at least you can see how to
turn
> any MULU into a few LSL/R's and ADD/SUB's.
>
It does run faster... i have tried such methods on a program of my own...
>
> That's all I can come up with now... There are many ways to handle loops,
if
> that is your main concern. The DBRA is pretty good, unless you need that
I don' t like the dbra as it needs a register just for him and i am short on
registers
BTW: could a dbra use an adress register as a counter ?
> register for something else. CMP's are pretty efficient but pose problems
> when several CMP's need to be made for one branch. You might want to look
> at the output of a C compiler. You can get "ideas" for how to do your own
> code.
I have none..
However, i seek some info on the speed of different instructions . Where is
such a doc available ?
Mathieu