Re: A89: matrix
[Prev][Next][Index][Thread]
Re: A89: matrix
Olle Hedman writes:
> you dont have much of a choice more than move.l (Ax)+,(Ay)+
Actually, you do. If you have lots of stuff to move then a little more
cost on the preparation/cleanup side doesn't matter if you can save
heaps on the actual transfer. In that case this might help:
movem.l d1-d7/a2-a6,-(sp) ; Save all registers
loop:
movem.l (a0)+,d1-d7/a2-a6 ; Suck in 48 bytes at once
movem.l d1-d7/a2-a6,(a1)+ ; Store them at destination
movem.l (a0)+,d1-d7/a2-a6 ; Suck next 48 bytes in
movem.l d1-d7/a2-a6,(a1)+ ; Store at destination
...
dbra d1,loop ; Do the loop if needed
movem.l (sp)+,d1-d7/a2-a6 ; Restore registers
With the move.l method you waste 1 bus cycle for the insn fetch for
every 4 bytes moved. With the movem.l method you waste 4 cycles for
every 48 bytes moved, that is, your bus bandwith loss goes from 20%
to only 7.7%. Of course if your block size is known a priori and it
is small enough to warrant a loop unroll, then your d0 becomes free
so you can move up to 52 bytes per 2 insns, which further decreases
the bandwidth waste to 7.1%. If you need absolutely everything that
is possible, you can disable the interrupts, save a7 to some known
location and include a7 in the transfer too - your wasted bandwith
will reach the ever low 6.7%.
It's an old trick which was worth to do on a 68000. With the advent of
the 68010 it went out of fashion for the 68010 and all further CPUs
have a loop mode (or equivalent) where data blocks can be moved
without insn fetches slowing down the transfer (i.e. your copy speed
is only limited by the actual transfer speed of the bus). On the old
68000, however, the above method was quite popular when you needed
that few extra bus cycles.
Regards,
Zoltan
Follow-Ups:
References: