We are going to start coding the CPU emulation, which is really the main portion of this project. So before we begin, we need to analyse in great detail what's the emulated CPU and how it works. By searching for information on the Internet, we can find on these sites: http://www.cpu-world.com/Arch/8080.html and http://en.wikipedia.org/wiki/Intel_8080.

Note:

This chapter assumes that you already understand binary calculation, hexadecimal, bytes, bit, etc.
If you don't, you might want to read a bit on the subject. See for example this page: http://en.wikipedia.org/wiki/Bitwise_operation

 

  • The CPU runs at 2Mhz, and each instructions can take 4 (or more) clock cycles to execute. This gives us an approximative 500000 instructions per second. If you recall from chapter 3, we are going to run at 60 frames per second. This theorically allows about a maximum of 500000/60 = 8300 instructions per frame. However, many instructions run slower than that, we are probably going to run less than that.
  • The 8080 is a 8/16 bit processor: its data bus is 8-bit. This means that it can transfer 8 bit at a time to and from memory. The address bus is 16-bit, which mean that it can access up to 65536 bytes of memory. (However, this space invaders game only uses 16KB of addressable memory.)
  • The CPU registers are:
    • PC (16-bit) Program Counter. This register always point to the next instruction to be executed in memory, normally in ROM
    • SP (16-bit) Stack Pointer. This register should be set to point somewhere in RAM. The stack grows 'downward', meaning that it should be set at the end of the physical RAM.
    • A (8-bit) Accumulator. The accumulator is always used to store mathematical operation results. Almost every instructions read or write to this register.
    • B, C, D, E, H, L (8-bit) general purpose registers.
    • BC (16-bit) Virtual register: this register is in fact the combination of register B and C, with B being the 'most significant byte' and C the 'least significant byte'. In Blitzmax code, we could say that BC = (B * 256) + C. Or better yet using fast shift: BC = (B shl 8) + C
    • DE (16-bit) same as BC, but uses D and E.
    • HL (16-bit) same as BC, but uses H and L.
  • There is also a couple of flags (boolean value True / False) that control or get updated when the CPU perform a mathematical operation, such as adding two values.
    • Sign: Result was negative (i.e.: Bit number 7 was set)
    • Zero: Result was zero
    • Carry: Result caused an overflow (i.e.: if you add 255 + 1, it 'overflows' at returns 0)
    • Half carry (or auxiliary carry): Similar to Carry, but check overflow from bit 3 to bit 4. (i.e.: if you add 15 + 1 = 16).
    • Parity: Result contains an even number of bit set. (i.e.: %0100001).
    • Interrupt: Set to True if interrupts are currently enabled
  • Supports for up to 256 input and output ports.
  • Interrupts are controlled by the Interrupt flag: when an interrupt occurs, the CPU calls the subroutine, and returns when it ends. The stack is used to store the current PC (program counter).

Ok. That's a lot of information. Now let's translate this into code. Open up cpu.bmx. We need to have an internal representation of all the registers. Insert this piece into the TCpu type:

    Field PC: Short 'Program Counter: Current instruction pointer. 16-bit register.
    Field SP: Short 'Stack Pointer. 16-bit register
    Field A: Byte 'Accumulator. 8-bit register
    Field B: Byte 'Register B. 8-bit register
    Field C: Byte 'Register C. 8-bit register
    Field D: Byte 'Register D. 8-bit register
    Field E: Byte 'Register E. 8-bit register
    Field H: Byte 'Register H. 8-bit register
    Field L: Byte 'Register L. 8-bit register
    Field BC: Short 'Virtual register BC (16-bit) combinaison of registers B and C
    Field DE: Short 'Virtual register DE (16-bit) combinaison of registers D and E
    Field HL: Short 'Virtual register HL (16-bit) combinaison of registers H and L
    Field SIGN: Byte 'Sign flag
    Field ZERO: Byte 'Zero flag
    Field HALFCARRY: Byte 'Half-carry (or Auxiliary Carry) flag
    Field PARITY: Byte 'Parity flag
    Field CARRY: Byte 'Carry flag
    Field INTERRUPT: Byte 'Interrupt Enabled flag
    Field CRASHED:Byte 'Special flag that tells if CPU is crashed (i.e.: stopped)

As you can see, every register is declared by either a Byte or a Short depending on its size (8-bit vs 16-bit). You can notice an intruder: the CRASHED flag. This is not really a CPU flag, it will be used to stop the emulation in case something wrong happens while executing code. We'll also need accessor methods for these registers. Add these methods:

    Method SetA(inByte:Byte)
        A = inByte
    End Method
    Method SetB(inByte:Byte)
        B = inByte
        BC = (B Shl 8) + C
    End Method
    Method SetC(inByte:Byte)
        C = inByte
        BC = (B Shl 8) + C
    End Method
    Method SetD(inByte:Byte)
        D = inByte
        DE = (D Shl 8) + E
    End Method
    Method SetE(inByte:Byte)
        E = inByte
        DE = (D Shl 8) + E
    End Method
    Method SetH(inByte:Byte)
        H = inByte
        HL = (H Shl 8) + L
    End Method
    Method SetL(inByte:Byte)
        L = inByte
        HL = (H Shl 8) + L
    End Method
    Method SetBC(inShort:Short)
        BC = inShort
        B = BC Shr 8
        C = BC
    End Method
    Method SetDE(inShort:Short)
        DE = inShort
        D = DE Shr 8
        E = DE
    End Method
    Method SetHL(inShort:Short)
        HL = inShort
        H = HL Shr 8
        L = HL
    End Method
    Method SetSP(inShort:Short)
        SP = inShort
    End Method

If you look closely, you'll see that SetA() and SetSP() simply set their corresponding register. Every other methods set more than one register. Why is that? Ah! That's because our friend, the Intel 8080, has got some virtual registers! We call them virtual because they don't contain a value of their own. Take BC for example. The value of BC can be calculated by fitting B and C together in 16-bit. To emulate this behavior, I'm making the SetB() and SetC() functions not only set their register, but also re-evaluate the virtual value of BC. The reverse is true: calling SetBC() will not only set the virtual 16-bit value, but also split back into B and C to keep them syncronize at all time. Whew.

What's next? Remember the input and output ports? We need to know what this game actually uses in order to declare only what's needed. Normally, to find that out we would need to have the emulated game run and carefully investigate what and when it reads or writes to any of the available ports. However, by experimentating and searching on the Internet, I have found that the game uses only 3 input ports (port number 1, 2, 3) and 5 output ports (port number 2, 3, 4, 5, 6). We already have created a file called io.bmx where input and output should be handled. But before we switch to that file, let's create a brand new one called util.bmx where we can put general purpose utility functions and stuff that could be accessed by every modules in our project. Here, I have set a couple of functions. Enter this in your new file:

SuperStrict

Const BIT0:Byte=1
Const BIT1:Byte=2
Const BIT2:Byte=4
Const BIT3:Byte=8
Const BIT4:Byte=16
Const BIT5:Byte=32
Const BIT6:Byte=64
Const BIT7:Byte=128

Function HByte$(value:Byte)
    Return Right$(Hex$(value), 2)
End Function

Function HWord$(value:Short)
    Return Right$(Hex$(value), 4)
End Function

Function HBool$(value:Byte)
    If value Then
        Return "True"
    Else
        Return "False"
    End If
End Function

Function EndianFlip:Int(value:Int)
' Depending on CPU endianness, swap (or not) a 32-bit value
? LittleEndian ' Intel x86 are littleendian: we need to swap byte order
    Local temp:Int
    Local src:Byte Ptr = Varptr(value)
    Local dst:Byte Ptr = Varptr(temp)
    dst[0] = src[3]
    dst[1] = src[2]
    dst[2] = src[1]
    dst[3] = src[0]
    Return temp
? BigEndian ' PowerPC are bigendian: no need to swap
    Return value
?
End Function

Let me explain. The 8 BIT constants are calculated like this: (1 shl x).  They represent the decimal value of each bit in a byte. This will come in handy later on when we'll have to manipulate bits. The 3 next global functions will be used to display debugging information on the screen. HByte$() outputs a two-character string in hexadecimal representing one byte of data (from $00 to $FF). HWord$() does the same thing, but for short (or word as Blitzmax calls them), from $0000 to $FFFF. The last one, HBool$() outputs True or False depending on the value of the parameter. It will be used to display the current value of a flag. And now we have EndianFlip: this function is general purpose: it is used to 'normalize' the byte order in a 32-bit value. It's because Intel CPU, such as Pentium or Core Duo, are Little Endian: they store values in a reverse order. You can read a bit about this subject here: http://en.wikipedia.org/wiki/Little_endian. That's the reason why if you create RGBA Pixmap on a Windows machine running an Intel processor and try to draw a red pixel, the 32-bit int value you have to provide is $000000FF (which seems like ABGR).

Enough, now let's proceed with io.bmx. At the moment it contains pretty much nothing useful. Scrap everything out and replace its content entirely with this:

SuperStrict
Import "util.bmx"
Import "sound.bmx"

Global io: TIO = New TIO

Type TIO
    ' Input and output ports
    Field OUT_PORT2:Byte
    Field OUT_PORT3:Byte
    Field OUT_PORT4LO:Byte
    Field OUT_PORT4HI:Byte
    Field OUT_PORT5:Byte
    Field IN_PORT1:Byte
    Field IN_PORT2:Byte

    Method Init()
        ' Dipswitch: BIT0 and BIT1 controls starting number of life (from 3 to 6)
        ' BIT7 prints additionnal coin message on intro screen
        IN_PORT2 = 0
        IN_PORT2 = IN_PORT2 | (BIT0 | BIT1)
        IN_PORT2 = IN_PORT2 | (BIT7)
       
        DebugLog("I/O initialized.")
    End Method
   
    Method Update()
        ' Clear player input bits
        IN_PORT1 = IN_PORT1 & (~(BIT0 | BIT1 | BIT2 | BIT4 | BIT5 | BIT6))
        IN_PORT2 = IN_PORT2 & (~(BIT2 | BIT4 | BIT5 | BIT6))
       
        If KeyHit(KEY_1) Then IN_PORT1 = IN_PORT1 | BIT2 ' Player 1 start
        If KeyHit(KEY_2) Then IN_PORT1 = IN_PORT1 | BIT1 ' Player 2 start
        If KeyHit(KEY_T) Then IN_PORT2 = IN_PORT2 | BIT2 ' Tilt detection trigger
       
        ' Move left: same key used for player 1 and 2
        If KeyDown(KEY_LEFT) Then
            IN_PORT1 = IN_PORT1 | BIT5
            IN_PORT2 = IN_PORT2 | BIT5
        End If
        ' Move right: same key used for player 1 and 2
        If KeyDown(KEY_RIGHT) Then
            IN_PORT1 = IN_PORT1 | BIT6
            IN_PORT2 = IN_PORT2 | BIT6
        End If
        ' shoot button: same key used for player 1 and 2
        If KeyDown(KEY_SPACE) Then
            IN_PORT1 = IN_PORT1 | BIT4
            IN_PORT2 = IN_PORT2 | BIT4
        End If
       
        ' Insert coin: we play a sound effect here just for fun:
        ' this sound is NOT produced by the machine
        If KeyHit(KEY_3) Or KeyHit(KEY_5) Then
            IN_PORT1 = IN_PORT1 | BIT0
            sound.PlayCoin()
        End If
    End Method

    Method OutputPort(port: Byte, value: Byte)
        Select port
            ' Port 2 simply stores a 8-bit value
            Case 2
                OUT_PORT2 = value
            ' Port 3 controls play of various sound effects
            Case 3
                If(value & BIT0) Then
                    sound.StartUfo()
                Else
                    sound.StopUfo()
                End If
           
                If((value & BIT1) And Not (OUT_PORT3 & BIT1)) sound.PlayShot()
                If((value & BIT2) And Not (OUT_PORT3 & BIT2)) sound.PlayBaseHit()
                If((value & BIT3) And Not (OUT_PORT3 & BIT3)) sound.PlayInvHit()
                If((value & BIT4) And Not (OUT_PORT3 & BIT4)) sound.PlayExtraLife()
                If((value & BIT5) And Not (OUT_PORT3 & BIT5)) sound.PlayBeginPlay()
                OUT_PORT3 = value

            ' Port 4 stores a 16-bit value (by storing two 8-bit values)
            Case 4
                OUT_PORT4LO = OUT_PORT4HI
                OUT_PORT4HI = value

            ' Port 5 also controls sound
            Case 5
                If((value & BIT0) And Not (OUT_PORT5 & BIT0)) sound.PlayWalk1()
                If((value & BIT1) And Not (OUT_PORT5 & BIT1)) sound.PlayWalk2()
                If((value & BIT2) And Not (OUT_PORT5 & BIT2)) sound.PlayWalk3()
                If((value & BIT3) And Not (OUT_PORT5 & BIT3)) sound.PlayWalk4()
                If((value & BIT4) And Not (OUT_PORT5 & BIT4)) sound.PlayUfoHit()
                OUT_PORT5 = value
        End Select
    End Method
   
    Method InputPort:Byte(port: Byte)
        Local result: Byte
        Select port
            ' Player 1 Input keys:
            Case 1
                result = IN_PORT1
    
            ' Player 2 Input keys And dip switches:
            Case 2
                result = IN_PORT2
   
            ' Port 3 calculate using values stored in output port 2 and 4
            Case 3
                result = (((OUT_PORT4HI Shl 8) | OUT_PORT4LO) Shl OUT_PORT2) Shr 8
        End Select
       
        Return result
    End Method
End Type

Oh my there is a lot going on here. Let's take a look slowly. First we declare all the required input and output ports as bytes. Let's explain the input ports:

  • IN_PORT1: Input port 1. This port is used as 8 1-bit flags as follow:
    • Bit 0: Coin insertion trigger
    • Bit 1: Start button: Two players
    • Bit 2: Start button: One player
    • Bit 3: not used
    • Bit 4: Player one: fire button
    • Bit 5: Player one: move left
    • Bit 6: Player one: move right
    • Bit 7: not used
  • IN_PORT2: Input port 2. Like port 1, it contains 8 1-bit flags:
    • Bit 0: Hardware dipswitch: starting number of lives (from 3 to 6)
    • Bit 1: Hardware dipswitch: combines with bit 0
    • Bit 2: Tilt detection trigger
    • Bit 3: not used
    • Bit 4: Player two: fire button
    • Bit 5: Player two: move left
    • Bit 6: Player two: move right
    • Bit 7: Hardware dipswitch: coin message display
  • Input port 3: This input port is special. It does not contain a value on its own. It is used to calculate a value based on values first entered into ports 2 and 4. The game hardware probably uses this trick because the CPU was too slow to evaluate it. You can look at the actual calculation in InputPort() under 'Case 3'.

If you analyse what's going on inside Update(), you will see how your actual computer keyboard input is getting 'translated' into the game hardware input port bits. Interestingly, the game allows for different keys between player one and player two. However I'm setting both inputs at the same time so you can control both players if you eventually start a two player game. Another very interesting info is that the game actually have a tilt detection mecanism, like a pinball arcade! I can't test it on a real Space Invaders, but according to this emulator, if you would kick the machine hard enough, it would 'Tilt' and you'd be game over :) Let's have a look at the output ports now:

  • OUT_PORT2: output port 2. Stores a value used by the special calculator invoked thru input port 3.
  • OUT_PORT3: output port 3. Controls various sound events:
    • Bit 0: If set to true, start UFO looping sound. If false, stop it
    • Bit 1: When triggered, play player shot sound
    • Bit 2: When triggered, play player hit sound
    • Bit 3: When triggered, play invader hit sound
    • Bit 4: When triggered, play Extra life sound
    • Bit 5: When triggered, play begin play sound
    • Bit 6: not used
    • Bit 7: not used
  • OUT_PORT4 (LO/HI): output port 4. Stores two 8-bit values used by input port 3.
  • OUT_PORT5: output port 5. Also controls vairous sound events:
    • Bit 0: When triggered, play invader walk1 sound
    • Bit 1: When triggered, play invader walk2 sound
    • Bit 2: When triggered, play invader walk3 sound
    • Bit 3: When triggered, play invader walk4 sound
    • Bit 4: When triggered, play ufo hit sound
    • Bit 5: not used
    • Bit 6: not used
    • Bit 7: not used
  • OUT_PORT6: output port 6. I haven't figured what this port does exactly. It appears to output values while game runs. It could be an interesting investigation to figure it out. But for now, this port is not handled by the emulator.

Read the code you entered to make sure you pretty much understand what's going on. Generally speaking, Update() takes keyboard input and translate them into input port bits. OutputPort() performs the request action according to which port was called with what value. InputPort() fetches a value and returns the result to the game. As said earlier, input port 3 is rather cryptic. The value it calculates is used to draw moving objects on the game screen by computing the resulting offset. No need to worry about it at all.

Ok, we have side-tracked a bit. Now it's time to go in cpu.bmx at last and start emulating some code. First, let's import our general-purpose utility file, we'll need it. Just after SuperStrict, insert:

Import "util.bmx"

Next, add the following fields:

    Field instruction_per_frame: Int = 4000 ' Approximate real machine speed
    Field current_inst:Byte

    ' Addionnal debug fields, not used by CPU
    Field disassembly_pc: Short
    Field debug_output$

Now locate the method Run(). It is empty. Let us make it a bit more useful and add some useful methods at the same time:

    Method Run()
        DrawText("-> " + cpu.debug_output$, 0, 20)

        ' Run about one frame worth of instruction code
        For Local i:Int=0 Until instruction_per_frame
            If CRASHED Then Return
            ExecuteInstruction()
        Next
    End Method
   
    Method ExecuteInstruction()
        disassembly_pc = PC
        current_inst = FetchRomByte()
       
        Select current_inst
            Default
                Disassembly("Undefined instruction: "+HByte$(current_inst))
                CRASHED = True
        End Select 
    End Method

    Method FetchRomByte:Byte()
        Local b:Byte = memoryptr[PC]
        PC :+ 1
        Return b
    End Method

    Method FetchRomShort:Short()
        Local s:Short = (memoryptr[PC+1] Shl 8) + (memoryptr[PC])
        PC :+ 2
        Return s
    End Method
   
    Method Disassembly(inText$)
        debug_output$ = HWord$(disassembly_pc)+": " + inText$
    End Method

Explanation time. Run() will try to execute one frame worth of CPU instructions by calling ExecuteInstruction() in a loop. Our field instruction_per_frame = 4000 controls the emulated speed of the CPU. If you set this value too low, the game will not run correctly. Too fast will also breaks the game. This is really a trial-and-error until you find a sweet spot where the game appears to function perfectly. The reality is that each instruction normally takes a variable number of clock cycles, and a game could be timed in such a strict way that emulation would be very very hard. But this game seems 'friendly' enough and will tolerate a little too slow or little too fast CPU emulation.  ExecuteInstruction() is responsible for actually decoding and executing one CPU instruction at a time. At the moment, it doesn't understand any instruction at all. We're going to add a couple in a minute. The two FetchRom methods takes one or two bytes from the game memory and moves the PC. Since every instruction on this Intel 8080 CPU takes exactly one byte, we are only reading one byte at a time. However we will find eventually take some instruction requires an additionnal parameter that can be either a byte or a word. The last method, Disassembly(), is just a debug helper to display the last executed instruction so we can see what's happening inside our emulator. Alright, all is well, it's time to compile and run that baby to see if we made any mistake. If everything works, the game should start and print on the screen:

-> 0000: Undefined instruction: 00

Congratulations! It 'works'. If we want to get a little bit further, we'll have to start emulating instructions. Let's look at what the engine is telling us. The first $0000 is the 16-bit hexadecimal address of the instruction that was about to be executed. It is the first byte in the ROM, which is where the CPU 'boots'. The instruction it was trying to execute was $00. To know what this instruction number means, we need a list. Here's one on the 'net: http://www.comsci.us/cpu/8080/isindex.html or a more condensed view: http://www.comsci.us/cpu/8080/matrix.html. Great, it appears that $00 means NOP. Let's add it inside the 'select' block in ExecuteInstruction():

            Case $00
                Instruction_NOP()

..and we'll need to add the corresponding method somewhere:

    Method Instruction_NOP()
        'Disassembly("NOP")
    End Method

Easy? This instruction does nothing, so emulating it is fairly trivial. Inside the method I've left a commented command. This Disassembly() command could be useful if you would like to output a continuous trace of what instruction were executed. For now I'll keep them commented so our emulator can run faster. Let's compile and run our project now that we fully emulate the $00 instruction...

-> 0003: Undefined instruction: C3

Great! What's $C3 now? From the previous web site we see that:  C3  JMP   Addr   Jump to Direct Address. Let's emulate the JMP instruction and its 'family'. Add this inside the 'select' block, right after the NOP instruction we did above:

             Case $c3, $c2, $ca, $d2, $da, $f2, $fa
                Instruction_JMP()

..and again, the method itself:

     Method Instruction_JMP()
        'Local name$
        Local condition:Byte = True
        Local data16:Short = FetchRomShort()
   
        Select current_inst
            Case $c3
                'name = "JMP"
            Case $c2
                'name = "JNZ"
                condition = Not ZERO
            Case $ca
                'name = "JZ"
                condition = ZERO
            Case $d2
                'name = "JNC"
                condition = Not CARRY
            Case $da
                'name = "JC"
                condition = CARRY
            Case $f2
                'name = "JP"
                condition = Not SIGN
            Case $fa
                'name = "JM"
                condition = SIGN
        End Select
   
        'Disassembly(name$+" "+HWord$(data16)+"  condition="+condition)
        If condition Then
            PC = data16
        End If
    End Method

Again, you can see that some lines are commented out: this is to speed up emulation. These lines could be uncommented to trace or debug emulation. Now there's some things that need to be explained with this method. First, by looking at the code, you can see that not only our interesting $c3 opcode is being handled here but a few related ones too. This will happen frequently since a lot of instructions have several opcode numbers associated with them. In fact, the instruction opcode not only tells the CPU what instruction to execute, but also what parameters or condition flag to use. In case of JMP here, you can see that in the end it simply sets PC. However, there are condition flags can be checked to see if the JMP should occur or not. JZ for exemple will jump only if the Zero Flag is True. Also of interest here is that the JMP instruction by itself doesn't tell where to jump: that's why we are calling FetchRomShort() at the beginning of the method: we are fetching a 16-bit address value, thus making PC advance an additionnal two bytes.

The keen of eye may notice an important note: in the instruction-set website, there are three jump opcodes that we are not handling here: JPO, JPE, and PCHL. PCHL is simply a different command so it should be taken care of by another method. However, JPO and JPE are ordinary conditonnal jump commands: I am omitting them intentionally because I haven't coded the Parity Flag emulation at all. Space Invaders does not use the parity flag, so any instruction regarding parity are not going to be handled by our emulator. This creates an important distinction here: We are coding a Space Invaders emulator, not a fully general purpose Intel 8080 emulator. Trying to run a different game using another ROM might not work because of that.

That being said, let's compile & run now to see what happens next!

-> 18D4: Undefined instruction: 31

Wow, we did really jump to a new address; look at our current PC: $18D4. The game's trying to execute opcode $31 and it is not handled by our select block. Oh well, that's enough data for today, off I go.

I you feel brave enough, you may try to implement instruction $31 by experimenting. In the next chapter, you'll see what it should be.

Chapter 5: The CPU emulation, continued

Back to tutorial home page