A83: Re: Joe Wingerbermuhle?


[Prev][Next][Index][Thread]

A83: Re: Joe Wingerbermuhle?



That is a handy routine.  One thing I noticed is that you could save 3
t-states per loop iteration if you loadeded $ff (-1) into D or E before the
loop and load it from there into C instead of loading an immediate (register
to register = 4t, immediate to register = 7t):

getString:
 or a
 ret z
 ld b,a
 ld d,$ff
 xor a
getStringL1:
 push bc
 ld c,d            ; reload preloaded byte counter
 cpir
 pop bc
 djnz getStringL1
 ret

Actually, now that I look at it again, the value of C never changes because
it is saved along with the loop counter.  So just load it before the loop:

getString:
 or a
 ret z
 ld b,a
 ld c,$ff            ; preload to save time, it never changes
 xor a
getStringL1:
 push bc             ; save loop counter AND byte counter
 cpir
 pop bc              ; restores loop counter AND byte counter
 djnz getStringL1
 ret

Hmm, I just noticed one more thing that would make it even faster, if you
don't mind trashing DE.  Pushes/pops are pretty slow compared to 8-bit
register instructions (since the Z80 IS 8-bit, not 16-bit).  So why not just
store the values in DE instead of pushing/popping it:

getString:
 or a
 ret z
 ld b,a
 ld e,$ff            ; preload to save time
 xor a
getStringL1:
 ld d,b              ; save the loop counter
 ld c,e              ; reload the byte counter
 cpir
 ld b,d              ; restore the loop counter
 djnz getStringL1
 ret

Ah, one more thing, last one, I promise (I know you're all getting sick of
me by now, this is beginning to look like an article on optimizing...).  Why
do all that register shuffling just to use DJNZ?  The reason that
instruction is normally faster than looping yourself is because it handles
the decrementing, comparing and jumping all in one instruction.  But the
downside is that the only register you can use is B.  B is already used for
the CPIR, making it faster to do it yourself with a different register:

getString:
 or a                ; is it 0?
 ret z               ; then we're already pointing to it
 ld d,a              ; D is the loop counter now
 ld e,$ff            ; preload to save time
 xor a               ; clear A, we're checking for 0
getStringL1:
 ld c,e              ; reload the byte counter
 cpir                ; find the zero byte at the end of the string
 dec d               ; decrement loop counter
 jr z,getStringL1    ; loop if not 0 (use JP to add 1 byte, save 2 t's)
 ret                 ; the end!

Ok, I think I'm done now, that routine is about as fast as it can get.  The
only possible change (that I see!) is to swap the JR with a JP, because an
absolute jump is 2 t-states faster, though it takes an extra byte for the
16-bit address.  Just for kicks, let's see how the routines compare to each
other (note that the first time is per iteration, the second is the startup
time, and the byte size includes the RET):
                                      Bytes:  T-States:
=======================================================
Original routine:                    |  13   | 62 (23)
Preloaded byte counter:              |  14   | 59 (30)
Preloaded byte counter w/o reload:   |  13   | 55 (30)
Saved counters w/ registers:         |  14   | 46 (30)
Final routine:                       |  13   | 41 (30)

I hope got those times/bytes right.  Anyways, this proves that you can
optimize almost any routine, no matter how small or optimized it looks.  We
saved 21 t-states per loop iteration.  That may seem like alot, but if you
had 100 strings, that would be 2100 t-states!  Note that the extra
preloading added 7 t-states to the startup time for the loop, but time would
still be saved even it only looped once.  And the routine is the same size.

TANSTATFC!

--
David Phillips <electrum@tfs.net>
ICQ: 13811951
AOL/AIM: Electrum32
86 Central: http://www.tfs.net/~electrum/
"There ain't no such thing as the fastest code!" -- Michael Abrash

-----Original Message-----
From: Henry Davidowich <rdaneelolivaw@hotmail.com>
To: assembly-83@lists.ticalc.org <assembly-83@lists.ticalc.org>
Date: Thursday, November 05, 1998 5:19 PM
Subject: A83: Joe Wingerbermuhle?


>
>Hey Joe, do you mind if I borrow this code?  I found it in Ahmed
>El-Helw's Periodic Table 2.0 (great program!).
>
>;---------= Point hl to string a =---------
>; by: Joe Wingerbermuhle
>; Thanks, this is a lot easier than my
>; method of multiplying string # * 12
>;
>; Input: a=string number (0 to 255)
>; hl->string data
>; Output: hl->string
>
>getString:
> or a
> ret z
> ld b,a
> xor a
>getStringL1:
> push bc
> ld c,-1
> cpir
> pop bc
> djnz getStringL1
> ret
>
>______________________________________________________
>Get Your Private, Free Email at http://www.hotmail.com