; Cases 8x8x4 sample ; Worst Case = 394 + (499 * YLength) cycles : 4'386 cycles ; Best Case = 394 + (331 * YLength) cycles : 3'042 cycles ; Real Case = 394 + ((331 + 12 * BitDisp) * YLength) ; ; Notice the worst case is faster then the best case of the routines I've seen. ; The worst case is a completelly non-alligned sprite ex : (15, Y). ; The best case is an alligned sprite ex : (16, Y). ; ; Note also that the clipping routines takes less cycles then shown up here. They take ; more cycles to start up, but once the patches are applied, they run much faster since they ; have less pixels to write. ; ; It's also possible to calculate the Mask table on the fly supposing you assume that ; color "00b" mean it's transparent. My game "Lemmingz" does this and it uses 8 cycles ; less for both cases. Although for this, I'll let you do the modifications :) ; For this, it's very simple, instead of using the mask table, the value found at that table ; is actually = (Color1 or Color2). ; ; If you read my document "Z80Optimization" you should be able to understand all the code I ; wrote in here. It uses *a lot* of code patching. Without it, the routine would've ; surelly boosted of 13 * BitDisp * YLen. That's a heck of a lot of cycles. ; ; Note : Your Destination address must be a multiple of 400h for optimization ; reasons. Although most addresses already used by the grayscla ; already have that. FC00h & 4000h it shouldn't prove a big problem. ; Also, your sprite should be arranged so that you have the 2 grey ; layers followed by the OR mask. So basically if you have a 8xY sprite, ; you should basically take 3 * 8 * Y bytes. Since there's 3 layers. ; ; Anyone could optimize this more, I'm sure there's still a few places to optimize :) ; There values only serve as example, you may change them as you see fit! ; Here I'm using the typical pages which are used by games for grayscale Dest equ 0FC00h ; First gray destination Delta equ (0CA00h - 0FC00h) / 100h ; Diffrence between the 2 gray pages / 100h ; Procedure to draw a 8xYx4 Sprite created by Christopher Tremblay ; The fastest routine I know for now! ; [In] : hl = Source, b = X, c = Y, a = YLength (< 128) ; [Out] : b = 0, a = 15 ; [Uses] : af, bc, de, hl, ix ;<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]>---<[]> DrawSprite: ld (SpriteLen), a ld (GrayDelta), a add a, a ld (Addr1), a ; Patch the length(s) rra ld e, a add a, c dec a ret m jr nc, IsLow inc a ld d, 0 ld (SpriteLen), a sub e neg ld e, a add hl, de ld c, d ; It's being clipped from the top! IsLow: push hl pop ix ; Load ix with hl ld a, c add a, a add a, a DrawPage: ld h, Dest / 400h ; (*1) ld l, a ld a, b cp 0 jp p, NotFixup ; Should we clip left? xor a NotFixup: rra add hl, hl rra add hl, hl rra ; a = X / 8 cp 15 jr nz, NoClipRight add a, l ld l, a ; Get rid of a ld a, SecondPixel - FirstPixel - 2 ld (FirstPixel), a ld a, 16 ld (SecondPixel), a xor a ld (SecondPixel + 1), a jr AdjustH NoClipRight: add a, l ld l, a AdjustH: ld a, b cp 0 jp p, NotClipLeft ; Should we clip left? cp -7 jp m, Return ; Nothing to draw we're overflowing! ld a, FirstPixel - DoFirstPixel ld (DoFirstPixel), a ld a, b dec hl NotClipLeft: and %00000111 ld b, a add a, a add a, b ; a = X * 3 sub 3 * 7 ; Skip the Left portion of it neg SpriteLen equ $ + 1 ld b, 8 ; e = Length loop ld (MDrawY1), a ld (MDrawY2), a xor a ; Clear the carry flag ld e, a StartMDrawLoop: ; Here b will always be 0 :) ld e, 0 ld d, e ; hl = Source GrayDelta equ $ + 2 ld a, (ix + 8) ; cd = 2st Gray Layer MDrawY1 equ $ + 1 ; Heavy code patching occrurs here! :o) jr $ + 2 rra rr d rra rr d rra rr d rra rr d rra rr d rra rr d rra rr d ; Shift a & d accordingly ld (OldA), a ld a, (ix) ; ab = 1st Gray Layer MDrawY2 equ $ + 1 ; Heavy code patching occrurs here! :o) jr $ + 2 rra rr e rra rr e rra rr e rra rr e rra rr e rra rr e rra rr e ; Shift a & e accordingly DoFirstPixel equ $ + 1 jr $ + 2 ld (NewA), a Addr1 equ $ + 2 ld a, (ix + 16) ld c, a and (hl) NewA equ $ + 1 or 0 ; Preserve the "a" value ld (hl), a ; Write the first pixel ld a, h add a, Delta ld h, a ld a, c and (hl) OldA equ $ + 1 or 0 ld (hl), a ; Write the second layers' last pix ld a, h sub Delta ld h, a FirstPixel equ $ + 1 jr $ + 2 inc hl ld a, c and (hl) or e ld (hl), a ; Write the next pixel ld a, h add a, Delta ld h, a ld a, c ; Restore the mask and (hl) or d ld (hl), a ; Write the second layers' first pix SecondPixel equ $ + 1 ld de, 10010h - (Delta * 100h) - 1 add hl, de inc ix djnz StartMDrawLoop Return: xor a ld (FirstPixel), a ld (DoFirstPixel), a ld a, 15 ld (SecondPixel), a ld hl, 10010h - (Delta * 100h) - 1 ld (SecondPixel), hl ret