## Thursday, May 11, 2017

### Apple II - Double Hi-Res From The Ground UpPart 8: Drawing lines with an 8-bit Bresenham algorithm 2 A big shoutout to the Folks at OpenApple for covering my humble blog. You can check out the episode here
The story so far...

After a lot of background work we now have the basis for two of the fundamental features of a graphics package: Drawing a point and drawing a line. These basic elements: points, lines, rectangles, etc.. are often referred to collectively as primitives.

Our line drawing algorithm works but can only handle lines where X1 > X0, Y1 > Y0 and DX > DY. So lets fix that!

## So what happens when DY > DX?

The answer is simple! The Y axis becomes the major axis for our line. In other words, our line moves farther along the y-axis than the x-axis. Let's take a look at just part of the code from last post:
```NXTLINE LDA Y1 ;Calculate DY as Y1-Y0
SEC
SBC Y0
STA DY
LDA X1 ;Calculate DX as X1-X0
SEC
SBC X0
STA DX
LSR ; Set epsilon to DX/2
EOR #\$FF
STA EPSILON
LDX X0 ;Start plotting at (X0,Y0)
LDY Y0
NXTVERT PUT DH.ROWSET ;Calculate horizontal value
NXTHORZ PHY
PUT DH.PLOT ;Plot point
PLY
CPX X1 ;Are we done?
BEQ ENDLINE
INX ;Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DY
CLC
STA EPSILON
BCC NXTHORZ ;If Epsilon hasn't rolled over move to next point
SEC ;If epsilon has rolled over subtract DX
SBC DX
STA EPSILON
INY ;Move along minor axis
BRA NXTVERT```
Notice that here, when X is the major axis we are doing five things:
1. We set epsilon to 255-DX/2 (lines 9-11)
2. We compare the X co-ordinate with X1 to see if were done drawing. (lines 18-19)
3. We advance the X co-ordinate. (line 20)
4. We add DY to epsilon at each iteration to determine if we need to move along the minor (Y) access (lines 21-25)
5. If so, we subtract DX from epsilon and advance the Y co-ordinate (lines 26-29)
When DY is larger than DX, our algorithm changes to:
1. Set epsilon to 255-DY/2 (lines 5-7)
2. Compare the Y co-ordinate with Y1 to see if were done drawing. (lines 18-19)
3. Advance the Y co-ordinate. (line 20)
4. Add DX to epsilon at each iteration to determine if we need to move along the minor (X) access (lines 21-25)
5. If so, we subtract DY from epsilon and advance the X co-ordinate (lines 26-29)
That code would look something like this:
```NXTLINE LDA Y1 ;Calculate DY as Y1-Y0
SEC
SBC Y0
STA DY
LSR ;Set epsilon to 255-DY/2
EOR #\$FF
STA EPSILON
LDA X1 ;Calculate DX as X1-X0
SEC
SBC X0
STA DX
LDX X0 ;Start plotting at (X0,Y0)
LDY Y0
NXTVERT PUT DH.ROWSET ;Calculate horizontal value
NXTHORZ PHY
PUT DH.PLOT ;Plot point
PLY
CPY Y1 ;Are we done?
BEQ ENDLINE
INY ;Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DX
CLC
STA EPSILON
BCC NXTVERT ;NOTE: we've changed Y so we need to call DH.ROWSET
SEC ;If epsilon has rolled over subtract DX
SBC DY
STA EPSILON
INX ;Move along minor axis
BRA NXTVERT```
If you recall, last post I mentioned that one of the ways we can implement Bresenham's algorithm is to write multiple routines and that's exactly what we are going to do here. The combined code looks like this:
```NXTLINE LDX X0
LDY Y0
LDA Y1 ;Calculate DY as Y1-Y0
SEC
SBC Y0
STA DY
LDA X1 ;Calculate DX as X1-X0
SEC
SBC X0
STA DX
CMP DY
BCS DXLINE
DYLINE LDA DY ;Set EPSILON to 255-(DY/2)
LSR
EOR #\$FF
STA EPSILON
DYHORZ PUT DH.ROWSET ;Calculate horizontal value
PHY
PUT DH.PLOT ;Plot point
PLY
CPY Y1 ;Are we done?
BEQ ENDLINE
INY ;Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DX
CLC
STA EPSILON
BCC DYHORZ ;NOTE: we've changed Y so we need to call DH.ROWSET
SEC ;If epsilon has rolled over subtract DX
SBC DY
STA EPSILON
INX ;Move along minor axis
BRA DYHORZ
DXLINE LDA DX ;Set EPSILON to 255-(DX/2)
LSR
EOR #\$FF
STA EPSILON
DXVERT PUT DH.ROWSET ;Calculate horizontal value
DXHORZ PHY
PUT DH.PLOT ;Plot point
PLY
CPX X1 ;Are we done?
BEQ ENDLINE
INX ;Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DY
CLC
STA EPSILON
BCC DXHORZ ;If Epsilon hasn't rolled over move to next point
SEC ;If epsilon has rolled over subtract DX
SBC DX
STA EPSILON
INY ;Move along minor axis
BRA DXVERT
ENDLINE RTS```
Keep in mind that we are still assuming that X0 < X1 and that Y0 < Y1. But what if that's not the case? Well if you think about it, as far as drawing the line goes the only difference between X0 > X1 and X0 < X1 is that we move along the X axis in the opposite direction. The proportion of steps on the X axis to the Y axis doesn't change. The same applies for the Y axis.

Clearly we could solve this problem by writing more routines but considering that there are two cases - that gives us four possibilities and each one would have to handle situations where DX > DY and DY < DX. Implementing each of these would yield eight separate routines!

## So is there another way?

After all, what we really need to do is change the direction we are moving along one or both axes. We even know how to implement that! Just flip our INX to a DEX for the X axis and our INY to DEY to do the same thing for the Y axis.

To accomplish this we're going to, once again use self-modifying code. While we are calculating X1 - X0, we will check if the result is less than zero - in other words the carry bit will be cleared. At which point we will do two things: flip the sign back from negative to positive (accomplished with an EOR \$FF) and then modify the INX/INY instructions in our two routines as needed.

This last part will require us to create a few more labels. CHGYA and CHGYB pointing to the two INY instructions and CHGXA and CHGXB for the two INX instructions.

The result looks like this:(self-modifying code and the portions that will be modified marked out in yellow)
```NXTLINE LDX X0
LDY Y0
LDA Y1 ;Calculate DY as Y1-Y0
SEC
SBC Y0
BCC Y0BIGGER
STA DY
LDA #\$C8 ;SMC: Hex code for INY
BRA CHGY
Y0BIGGER EOR #\$FF
STA DY
LDA #\$88 ;SMC: Hex code for DEY
CHGY STA CHGYA ;SMC: Modify instructions at CHGYA/CHGYB
STA CHGYB
LDA X1 ;Calculate DX as X1-X0
SEC
SBC X0
BCC X0BIGGER
STA DX
LDA #\$E8 ;SMC: Hex code for INX
BRA CHGX
X0BIGGER EOR #\$FF
STA DX
LDA #\$CA ;SMC: Hex code for DEX
CHGX STA CHGXA ;Modify instructions at CHGXA/CHGXB
STA CHGXB
LDA DX
CMP DY
BCS DXLINE
DYLINE LDA DY ;Set EPSILON to 255-(DY/2)
LSR
EOR #\$FF
STA EPSILON
DYHORZ PUT DH.ROWSET ;Calculate horizontal value
PHY
PUT DH.PLOT ;Plot point
PLY
CPY Y1 ;Are we done?
BEQ ENDLINE
CHGYA INY ;SMC: Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DX
CLC
STA EPSILON
BCC DYHORZ ;NOTE: we've changed Y so we need to call DH.ROWSET
SEC ;If epsilon has rolled over subtract DX
SBC DY
STA EPSILON
CHGXA INX ;SMC: Move along minor axis
BRA DYHORZ
DXLINE LDA DX ;Set EPSILON to 255-(DX/2)
LSR
EOR #\$FF
STA EPSILON
DXVERT PUT DH.ROWSET ;Calculate horizontal value
DXHORZ PHY
PUT DH.PLOT ;Plot point
PLY
CPX X1 ;Are we done?
BEQ ENDLINE
CHGXB INX ;SMC: Move along major axis
LDA EPSILON ;Epsilon = Epsilon + DY
CLC
STA EPSILON
BCC DXHORZ ;If Epsilon hasn't rolled over move to next point
SEC ;If epsilon has rolled over subtract DX
SBC DX
STA EPSILON
CHGYB INY ;SMC: Move along minor axis
BRA DXVERT
ENDLINE RTS```
...and we now have a fully functioning line drawing routine! There's just one problem. Remember our DH.COLOUR macro? It modifies the DH.PLOT code! So what will happen here when we call SETDCOLOR?
```*DH.COLOUR
*Sets colour for DH.PLOT routine.  Requires DH.PLOT.
*Jonathan Graham/Battlestations
*Free for non-commercial use with attribution
*
LDA CLOM,Y ;Lookup low byte of MAIN memory colour table
STA ]ORMAIN+1 ;Update the ORA instruction
LDA CHIM,Y ;Lookup high byte of MAIN memory colour table
STA ]ORMAIN+2  ;Update the ORA instruction
LDA CLOA,Y ;Lookup low byte of AUX memory colour table
STA ]ORAUX+1 ;Update the ORA instruction
LDA CHIA,Y ;Lookup high byte of AUX memory colour table
STA ]ORAUX+2 ;Update the ORA instruction```
If you recall the ]ORMAIN and ]ORAUX variables take on the value they were defined with most recently in the code. Since they are defined in the DH.PLOT macro. These variables will point to the part of the line draw routine where DX > DY. Which means for our other code, which is used when DX < DY our line will be the default colour: green.

This problem can be solved in a number of ways but to preserve the modularity of our code we're going to be a little less than efficient. Our code, as it stands will modify the correct ORMAIN/ORAUX codes in the DXLINE routine (used when DX > DY). All we need is another to change the code in the DYLINE routine. If you take a look in DH.PLOT you will see that ]ORMAIN is 13 bytes from the begining, and ORAUX is 31 bytes from the beginning. So all we need is one more label at the point where we PUT DH.PLOT called PLOTDYL and then use the pseudo opcode EQU to do the following
```SETDCOLOR TAY
PUT DH.COLOUR
]ORMAIN EQU PLOTDYL+13 ;Redefine ORMAIN to point to DYLINE routine
]ORAUX EQU PLOTDYL+31 ;Redefine ORAUX to point to DYLINE routine
PUT DH.COLOUR
RTS```
By placing the label PLOTDYL right where te second DH.PLOT macro is and then using the EQU's as above. We've tricked the assembler into making our second macro modify the DYLINE rounte....and that's it.

## Are we on the right track?

Our original goals were to produce code that was not just understandable but also fast.  So exactly how fast is this code we are writing anyway? When benchmarking code in the real world. I always find it handy to compare my results as well as the value of any improvements I'm about to make by comparing them to the theoretical maximum for whatever technique I'm using.

As we've seen, in our approach addressing a single pixel on the DHR screen requires us to load the screen data from the place we are about to plot, clear the bits we need to use with AND, insert our pixel data with OR then write the value back to screen memory.  At it's cheapest a LDA from some arbitrary memory location is 3 machine cycles, an AND is two, an OR is two more and a store is 4 more again.  This brings us to 11 machine cycles.  Which means we could fill the entire screen with a pixel of a particular colour in 295,680 cycles or about .39 seconds on a 1 Mhz machine.  Which is about 3 FPS.

This figure may seem small but keep in mind that in a real game you almost never address every point on the screen in a single frame. Furthermore a single pixel plot is considerably - about seven times - less efficient than drawing a bitmap on the screen a byte at a time.

I should also stress that what we are calculating is very much a unobtainable result since to draw a whole screen would require a stream of instructions about 107K in length which would consume all of the memory on a typical Apple //c or //e.

Another, perhaps more common benchmark is to compare your code against other applications or implementations. For the sake of completeness I've gathered a few popular DHR tools. In each case I've written the fastest code I could manage with the goal of filling the entire screen by drawing horizontal lines from left to right. Once the screen is filled we change colours and start again. After cycling through all 16 colours we divide our time by 16 to get the average time to fill an entire screen.

## So without further ado here are our challengers...

Beagle Graphics - Created by the ever popular Beagle Bros. It contains a number of tools for drawing graphics on the DHR screen from basic. Here I used the following program:
&MODE(2):FOR C = 0 TO 15:&HCOLOR=C:FOR Y = 0 to 191:HPLOT 0,Y to 139,Y: NEXT Y,C:

Graphics Magician - Developed in 1982 by Penguin Software, this program was the basis for many graphical adventure games. It allows people to create drawings using a special program which records various primitive graphical operations (line draws, fills, text). The user can then save these in a file and redraw them on the screen from BASIC or assembly using a free-to-use engine written in assembly. In this case I reverse-engineered the file format and created a file with a series of line drawing commands covering the entire screen. This is followed by a command to change colour and another screens-worth of line drawing commands.

cc65 - One of the more popular solutions for doing cross-compilation for the Apple II. I adapted routines from Bill Buckles PDF on drawing DHR graphics using CC65 to write the following code:
```int main(void)
{
int c,i;
videomode(VIDEOMODE_80COL);
clrscr();
dhireson();
dhiresclear();
for (c=0;c<16;c++)
for (i=0;i<192;i++)
dhrfbox(0, i, 139, i,c);
return 0;
}```
The source code and disk images for Bill's library can be found here

The results speak for themselves:

ApplicationScreen fill% of Theoretical
Theoretical0.39 seconds100%
Our Application2 seconds19.5%
cc653.4 seconds11%
Beagle Graphics10 seconds3.9%
Graphics Magician (DHR)24 seconds1.62%

In my experience 20% of the theoretical max on your first go is quite a good showing. We also appear to have beat out all of the other approaches we tested including commercial applications.

## Can we do any better?

The answer is a definite "yes". You will notice that every time we plot a point we push the Y register to preserve it's contents, then we PULL it to get the value back so we can update it. However when DX > DY the majority of the time this is entirely unnecessary. Making this change brings us down to 1.85s or 20.1% of our theoretical max!

Another optimization comes from the fact that the values DX, DY and EPSILON never change once the line drawing has begun. Which means we could, by means of self-modifying code replace all the places where we use these values with absolute addressing rather than zero page addressing. This would save us about three cycles per plot.

## Large scale vs small scale patterns.

All the optimizations I've just mentioned are all what I call "small scale" they involve taking the existing approach - in this case plotting a single point at a time - and making it run more tightly.  A large scale optimization is where we change the way we are solving the problem.

For example, if we know we are going to be plotting lines and that lines generally contain more than one point.  We could re-write our plot routine to draw two pixels at a time. We could handle lines with an odd number of pixels by writing a special case.

If you look closely at the CC65 program you will see that we are using the dhrfbox function. This was actually designed for drawing filled boxes instead of straight lines. Why did we use it then? Because it uses this trick extensively, and will draw just about two pixels per plot by writing a whole byte to the screen at a time. If we were to use the line drawing function from the CC65 library it would be about 20 times slower!

Ok that's it for this post, next up we will build out a simple API for the drawing routine as well as an demo app doing some 3D style drawing. After that I think we should cover writing a character generator. 