We are going to start coding the CPU emulation, which is really the main portion of this project. So before we begin, we need to analyse in great detail what's the emulated CPU and how it works. By searching for information on the Internet, we can find on these sites: http://www.cpu-world.com/Arch/8080.html and http://en.wikipedia.org/wiki/Intel_8080.
Note:
This chapter assumes that you already understand binary calculation, hexadecimal, bytes, bit, etc.
If you don't, you might want to read a bit on the subject. See for example this page: http://en.wikipedia.org/wiki/Bitwise_operation
- The CPU runs at 2Mhz, and each instructions can take 4 (or more) clock cycles to execute. This gives us an approximative 500000 instructions per second. If you recall from chapter 3, we are going to run at 60 frames per second. This theorically allows about a maximum of 500000/60 = 8300 instructions per frame. However, many instructions run slower than that, we are probably going to run less than that.
- The 8080 is a 8/16 bit processor: its data bus is 8-bit. This means that it can transfer 8 bit at a time to and from memory. The address bus is 16-bit, which mean that it can access up to 65536 bytes of memory. (However, this space invaders game only uses 16KB of addressable memory.)
- The CPU registers are:
- PC (16-bit) Program Counter. This register always point to the next instruction to be executed in memory, normally in ROM
- SP (16-bit) Stack Pointer. This register should be set to point somewhere in RAM. The stack grows 'downward', meaning that it should be set at the end of the physical RAM.
- A (8-bit) Accumulator. The accumulator is always used to store mathematical operation results. Almost every instructions read or write to this register.
- B, C, D, E, H, L (8-bit) general purpose registers.
- BC (16-bit) Virtual register: this register is in fact the combination of register B and C, with B being the 'most significant byte' and C the 'least significant byte'. In Blitzmax code, we could say that BC = (B * 256) + C. Or better yet using fast shift: BC = (B shl 8) + C
- DE (16-bit) same as BC, but uses D and E.
- HL (16-bit) same as BC, but uses H and L.
- There is also a couple of flags (boolean value True / False) that control or get updated when the CPU perform a mathematical operation, such as adding two values.
- Sign: Result was negative (i.e.: Bit number 7 was set)
- Zero: Result was zero
- Carry: Result caused an overflow (i.e.: if you add 255 + 1, it 'overflows' at returns 0)
- Half carry (or auxiliary carry): Similar to Carry, but check overflow from bit 3 to bit 4. (i.e.: if you add 15 + 1 = 16).
- Parity: Result contains an even number of bit set. (i.e.: %0100001).
- Interrupt: Set to True if interrupts are currently enabled
- Supports for up to 256 input and output ports.
- Interrupts are controlled by the Interrupt flag: when an interrupt occurs, the CPU calls the subroutine, and returns when it ends. The stack is used to store the current PC (program counter).
Ok. That's a lot of information. Now let's translate this into code. Open up cpu.bmx. We need to have an internal representation of all the registers. Insert this piece into the TCpu type:
Field PC: Short 'Program Counter: Current instruction pointer. 16-bit register.
Field SP: Short 'Stack Pointer. 16-bit register
Field A: Byte 'Accumulator. 8-bit register
Field B: Byte 'Register B. 8-bit register
Field C: Byte 'Register C. 8-bit register
Field D: Byte 'Register D. 8-bit register
Field E: Byte 'Register E. 8-bit register
Field H: Byte 'Register H. 8-bit register
Field L: Byte 'Register L. 8-bit register
Field BC: Short 'Virtual register BC (16-bit) combinaison of registers B and C
Field DE: Short 'Virtual register DE (16-bit) combinaison of registers D and E
Field HL: Short 'Virtual register HL (16-bit) combinaison of registers H and L
Field SIGN: Byte 'Sign flag
Field ZERO: Byte 'Zero flag
Field HALFCARRY: Byte 'Half-carry (or Auxiliary Carry) flag
Field PARITY: Byte 'Parity flag
Field CARRY: Byte 'Carry flag
Field INTERRUPT: Byte 'Interrupt Enabled flag
Field CRASHED:Byte 'Special flag that tells if CPU is crashed (i.e.: stopped)
As you can see, every register is declared by either a Byte or a Short depending on its size (8-bit vs 16-bit). You can notice an intruder: the CRASHED flag. This is not really a CPU flag, it will be used to stop the emulation in case something wrong happens while executing code. We'll also need accessor methods for these registers. Add these methods:
Method SetA(inByte:Byte)
A = inByte
End Method
Method SetB(inByte:Byte)
B = inByte
BC = (B Shl 8) + C
End Method
Method SetC(inByte:Byte)
C = inByte
BC = (B Shl 8) + C
End Method
Method SetD(inByte:Byte)
D = inByte
DE = (D Shl 8) + E
End Method
Method SetE(inByte:Byte)
E = inByte
DE = (D Shl 8) + E
End Method
Method SetH(inByte:Byte)
H = inByte
HL = (H Shl 8) + L
End Method
Method SetL(inByte:Byte)
L = inByte
HL = (H Shl 8) + L
End Method
Method SetBC(inShort:Short)
BC = inShort
B = BC Shr 8
C = BC
End Method
Method SetDE(inShort:Short)
DE = inShort
D = DE Shr 8
E = DE
End Method
Method SetHL(inShort:Short)
HL = inShort
H = HL Shr 8
L = HL
End Method
Method SetSP(inShort:Short)
SP = inShort
End Method
If you look closely, you'll see that SetA() and SetSP() simply set their corresponding register. Every other methods set more than one register. Why is that? Ah! That's because our friend, the Intel 8080, has got some virtual registers! We call them virtual because they don't contain a value of their own. Take BC for example. The value of BC can be calculated by fitting B and C together in 16-bit. To emulate this behavior, I'm making the SetB() and SetC() functions not only set their register, but also re-evaluate the virtual value of BC. The reverse is true: calling SetBC() will not only set the virtual 16-bit value, but also split back into B and C to keep them syncronize at all time. Whew.
What's next? Remember the input and output ports? We need to know what this game actually uses in order to declare only what's needed. Normally, to find that out we would need to have the emulated game run and carefully investigate what and when it reads or writes to any of the available ports. However, by experimentating and searching on the Internet, I have found that the game uses only 3 input ports (port number 1, 2, 3) and 5 output ports (port number 2, 3, 4, 5, 6). We already have created a file called io.bmx where input and output should be handled. But before we switch to that file, let's create a brand new one called util.bmx where we can put general purpose utility functions and stuff that could be accessed by every modules in our project. Here, I have set a couple of functions. Enter this in your new file:
SuperStrict
Const BIT0:Byte=1
Const BIT1:Byte=2
Const BIT2:Byte=4
Const BIT3:Byte=8
Const BIT4:Byte=16
Const BIT5:Byte=32
Const BIT6:Byte=64
Const BIT7:Byte=128
Function HByte$(value:Byte)
Return Right$(Hex$(value), 2)
End Function
Function HWord$(value:Short)
Return Right$(Hex$(value), 4)
End Function
Function HBool$(value:Byte)
If value Then
Return "True"
Else
Return "False"
End If
End Function
Function EndianFlip:Int(value:Int)
' Depending on CPU endianness, swap (or not) a 32-bit value
? LittleEndian ' Intel x86 are littleendian: we need to swap byte order
Local temp:Int
Local src:Byte Ptr = Varptr(value)
Local dst:Byte Ptr = Varptr(temp)
dst[0] = src[3]
dst[1] = src[2]
dst[2] = src[1]
dst[3] = src[0]
Return temp
? BigEndian ' PowerPC are bigendian: no need to swap
Return value
?
End Function
Let me explain. The 8 BIT constants are calculated like this: (1 shl x). They represent the decimal value of each bit in a byte. This will come in handy later on when we'll have to manipulate bits. The 3 next global functions will be used to display debugging information on the screen. HByte$() outputs a two-character string in hexadecimal representing one byte of data (from $00 to $FF). HWord$() does the same thing, but for short (or word as Blitzmax calls them), from $0000 to $FFFF. The last one, HBool$() outputs True or False depending on the value of the parameter. It will be used to display the current value of a flag. And now we have EndianFlip: this function is general purpose: it is used to 'normalize' the byte order in a 32-bit value. It's because Intel CPU, such as Pentium or Core Duo, are Little Endian: they store values in a reverse order. You can read a bit about this subject here: http://en.wikipedia.org/wiki/Little_endian. That's the reason why if you create RGBA Pixmap on a Windows machine running an Intel processor and try to draw a red pixel, the 32-bit int value you have to provide is $000000FF (which seems like ABGR).
Enough, now let's proceed with io.bmx. At the moment it contains pretty much nothing useful. Scrap everything out and replace its content entirely with this:
SuperStrict
Import "util.bmx"
Import "sound.bmx"
Global io: TIO = New TIO
Type TIO
' Input and output ports
Field OUT_PORT2:Byte
Field OUT_PORT3:Byte
Field OUT_PORT4LO:Byte
Field OUT_PORT4HI:Byte
Field OUT_PORT5:Byte
Field IN_PORT1:Byte
Field IN_PORT2:Byte
Method Init()
' Dipswitch: BIT0 and BIT1 controls starting number of life (from 3 to 6)
' BIT7 prints additionnal coin message on intro screen
IN_PORT2 = 0
IN_PORT2 = IN_PORT2 | (BIT0 | BIT1)
IN_PORT2 = IN_PORT2 | (BIT7)
DebugLog("I/O initialized.")
End Method
Method Update()
' Clear player input bits
IN_PORT1 = IN_PORT1 & (~(BIT0 | BIT1 | BIT2 | BIT4 | BIT5 | BIT6))
IN_PORT2 = IN_PORT2 & (~(BIT2 | BIT4 | BIT5 | BIT6))
If KeyHit(KEY_1) Then IN_PORT1 = IN_PORT1 | BIT2 ' Player 1 start
If KeyHit(KEY_2) Then IN_PORT1 = IN_PORT1 | BIT1 ' Player 2 start
If KeyHit(KEY_T) Then IN_PORT2 = IN_PORT2 | BIT2 ' Tilt detection trigger
' Move left: same key used for player 1 and 2
If KeyDown(KEY_LEFT) Then
IN_PORT1 = IN_PORT1 | BIT5
IN_PORT2 = IN_PORT2 | BIT5
End If
' Move right: same key used for player 1 and 2
If KeyDown(KEY_RIGHT) Then
IN_PORT1 = IN_PORT1 | BIT6
IN_PORT2 = IN_PORT2 | BIT6
End If
' shoot button: same key used for player 1 and 2
If KeyDown(KEY_SPACE) Then
IN_PORT1 = IN_PORT1 | BIT4
IN_PORT2 = IN_PORT2 | BIT4
End If
' Insert coin: we play a sound effect here just for fun:
' this sound is NOT produced by the machine
If KeyHit(KEY_3) Or KeyHit(KEY_5) Then
IN_PORT1 = IN_PORT1 | BIT0
sound.PlayCoin()
End If
End Method
Method OutputPort(port: Byte, value: Byte)
Select port
' Port 2 simply stores a 8-bit value
Case 2
OUT_PORT2 = value
' Port 3 controls play of various sound effects
Case 3
If(value & BIT0) Then
sound.StartUfo()
Else
sound.StopUfo()
End If
If((value & BIT1) And Not (OUT_PORT3 & BIT1)) sound.PlayShot()
If((value & BIT2) And Not (OUT_PORT3 & BIT2)) sound.PlayBaseHit()
If((value & BIT3) And Not (OUT_PORT3 & BIT3)) sound.PlayInvHit()
If((value & BIT4) And Not (OUT_PORT3 & BIT4)) sound.PlayExtraLife()
If((value & BIT5) And Not (OUT_PORT3 & BIT5)) sound.PlayBeginPlay()
OUT_PORT3 = value
' Port 4 stores a 16-bit value (by storing two 8-bit values)
Case 4
OUT_PORT4LO = OUT_PORT4HI
OUT_PORT4HI = value
' Port 5 also controls sound
Case 5
If((value & BIT0) And Not (OUT_PORT5 & BIT0)) sound.PlayWalk1()
If((value & BIT1) And Not (OUT_PORT5 & BIT1)) sound.PlayWalk2()
If((value & BIT2) And Not (OUT_PORT5 & BIT2)) sound.PlayWalk3()
If((value & BIT3) And Not (OUT_PORT5 & BIT3)) sound.PlayWalk4()
If((value & BIT4) And Not (OUT_PORT5 & BIT4)) sound.PlayUfoHit()
OUT_PORT5 = value
End Select
End Method
Method InputPort:Byte(port: Byte)
Local result: Byte
Select port
' Player 1 Input keys:
Case 1
result = IN_PORT1
' Player 2 Input keys And dip switches:
Case 2
result = IN_PORT2
' Port 3 calculate using values stored in output port 2 and 4
Case 3
result = (((OUT_PORT4HI Shl 8) | OUT_PORT4LO) Shl OUT_PORT2) Shr 8
End Select
Return result
End Method
End Type
Oh my there is a lot going on here. Let's take a look slowly. First we declare all the required input and output ports as bytes. Let's explain the input ports:
- IN_PORT1: Input port 1. This port is used as 8 1-bit flags as follow:
- Bit 0: Coin insertion trigger
- Bit 1: Start button: Two players
- Bit 2: Start button: One player
- Bit 3: not used
- Bit 4: Player one: fire button
- Bit 5: Player one: move left
- Bit 6: Player one: move right
- Bit 7: not used
- IN_PORT2: Input port 2. Like port 1, it contains 8 1-bit flags:
- Bit 0: Hardware dipswitch: starting number of lives (from 3 to 6)
- Bit 1: Hardware dipswitch: combines with bit 0
- Bit 2: Tilt detection trigger
- Bit 3: not used
- Bit 4: Player two: fire button
- Bit 5: Player two: move left
- Bit 6: Player two: move right
- Bit 7: Hardware dipswitch: coin message display
- Input port 3: This input port is special. It does not contain a value on its own. It is used to calculate a value based on values first entered into ports 2 and 4. The game hardware probably uses this trick because the CPU was too slow to evaluate it. You can look at the actual calculation in InputPort() under 'Case 3'.
If you analyse what's going on inside Update(), you will see how your actual computer keyboard input is getting 'translated' into the game hardware input port bits. Interestingly, the game allows for different keys between player one and player two. However I'm setting both inputs at the same time so you can control both players if you eventually start a two player game. Another very interesting info is that the game actually have a tilt detection mecanism, like a pinball arcade! I can't test it on a real Space Invaders, but according to this emulator, if you would kick the machine hard enough, it would 'Tilt' and you'd be game over :) Let's have a look at the output ports now:
- OUT_PORT2: output port 2. Stores a value used by the special calculator invoked thru input port 3.
- OUT_PORT3: output port 3. Controls various sound events:
- Bit 0: If set to true, start UFO looping sound. If false, stop it
- Bit 1: When triggered, play player shot sound
- Bit 2: When triggered, play player hit sound
- Bit 3: When triggered, play invader hit sound
- Bit 4: When triggered, play Extra life sound
- Bit 5: When triggered, play begin play sound
- Bit 6: not used
- Bit 7: not used
- OUT_PORT4 (LO/HI): output port 4. Stores two 8-bit values used by input port 3.
- OUT_PORT5: output port 5. Also controls vairous sound events:
- Bit 0: When triggered, play invader walk1 sound
- Bit 1: When triggered, play invader walk2 sound
- Bit 2: When triggered, play invader walk3 sound
- Bit 3: When triggered, play invader walk4 sound
- Bit 4: When triggered, play ufo hit sound
- Bit 5: not used
- Bit 6: not used
- Bit 7: not used
- OUT_PORT6: output port 6. I haven't figured what this port does exactly. It appears to output values while game runs. It could be an interesting investigation to figure it out. But for now, this port is not handled by the emulator.
Read the code you entered to make sure you pretty much understand what's going on. Generally speaking, Update() takes keyboard input and translate them into input port bits. OutputPort() performs the request action according to which port was called with what value. InputPort() fetches a value and returns the result to the game. As said earlier, input port 3 is rather cryptic. The value it calculates is used to draw moving objects on the game screen by computing the resulting offset. No need to worry about it at all.
Ok, we have side-tracked a bit. Now it's time to go in cpu.bmx at last and start emulating some code. First, let's import our general-purpose utility file, we'll need it. Just after SuperStrict, insert:
Import "util.bmx"
Next, add the following fields:
Field instruction_per_frame: Int = 4000 ' Approximate real machine speed
Field current_inst:Byte
' Addionnal debug fields, not used by CPU
Field disassembly_pc: Short
Field debug_output$
Now locate the method Run(). It is empty. Let us make it a bit more useful and add some useful methods at the same time:
Method Run()
DrawText("-> " + cpu.debug_output$, 0, 20)
' Run about one frame worth of instruction code
For Local i:Int=0 Until instruction_per_frame
If CRASHED Then Return
ExecuteInstruction()
Next
End Method
Method ExecuteInstruction()
disassembly_pc = PC
current_inst = FetchRomByte()
Select current_inst
Default
Disassembly("Undefined instruction: "+HByte$(current_inst))
CRASHED = True
End Select
End Method
Method FetchRomByte:Byte()
Local b:Byte = memoryptr[PC]
PC :+ 1
Return b
End Method
Method FetchRomShort:Short()
Local s:Short = (memoryptr[PC+1] Shl 8) + (memoryptr[PC])
PC :+ 2
Return s
End Method
Method Disassembly(inText$)
debug_output$ = HWord$(disassembly_pc)+": " + inText$
End Method
Explanation time. Run() will try to execute one frame worth of CPU instructions by calling ExecuteInstruction() in a loop. Our field instruction_per_frame = 4000 controls the emulated speed of the CPU. If you set this value too low, the game will not run correctly. Too fast will also breaks the game. This is really a trial-and-error until you find a sweet spot where the game appears to function perfectly. The reality is that each instruction normally takes a variable number of clock cycles, and a game could be timed in such a strict way that emulation would be very very hard. But this game seems 'friendly' enough and will tolerate a little too slow or little too fast CPU emulation. ExecuteInstruction() is responsible for actually decoding and executing one CPU instruction at a time. At the moment, it doesn't understand any instruction at all. We're going to add a couple in a minute. The two FetchRom methods takes one or two bytes from the game memory and moves the PC. Since every instruction on this Intel 8080 CPU takes exactly one byte, we are only reading one byte at a time. However we will find eventually take some instruction requires an additionnal parameter that can be either a byte or a word. The last method, Disassembly(), is just a debug helper to display the last executed instruction so we can see what's happening inside our emulator. Alright, all is well, it's time to compile and run that baby to see if we made any mistake. If everything works, the game should start and print on the screen:
-> 0000: Undefined instruction: 00
Congratulations! It 'works'. If we want to get a little bit further, we'll have to start emulating instructions. Let's look at what the engine is telling us. The first $0000 is the 16-bit hexadecimal address of the instruction that was about to be executed. It is the first byte in the ROM, which is where the CPU 'boots'. The instruction it was trying to execute was $00. To know what this instruction number means, we need a list. Here's one on the 'net: http://www.comsci.us/cpu/8080/isindex.html or a more condensed view: http://www.comsci.us/cpu/8080/matrix.html. Great, it appears that $00 means NOP. Let's add it inside the 'select' block in ExecuteInstruction():
Case $00
Instruction_NOP()
..and we'll need to add the corresponding method somewhere:
Method Instruction_NOP()
'Disassembly("NOP")
End Method
Easy? This instruction does nothing, so emulating it is fairly trivial. Inside the method I've left a commented command. This Disassembly() command could be useful if you would like to output a continuous trace of what instruction were executed. For now I'll keep them commented so our emulator can run faster. Let's compile and run our project now that we fully emulate the $00 instruction...
-> 0003: Undefined instruction: C3
Great! What's $C3 now? From the previous web site we see that: C3 JMP Addr Jump to Direct Address. Let's emulate the JMP instruction and its 'family'. Add this inside the 'select' block, right after the NOP instruction we did above:
Case $c3, $c2, $ca, $d2, $da, $f2, $fa
Instruction_JMP()
..and again, the method itself:
Method Instruction_JMP()
'Local name$
Local condition:Byte = True
Local data16:Short = FetchRomShort()
Select current_inst
Case $c3
'name = "JMP"
Case $c2
'name = "JNZ"
condition = Not ZERO
Case $ca
'name = "JZ"
condition = ZERO
Case $d2
'name = "JNC"
condition = Not CARRY
Case $da
'name = "JC"
condition = CARRY
Case $f2
'name = "JP"
condition = Not SIGN
Case $fa
'name = "JM"
condition = SIGN
End Select
'Disassembly(name$+" "+HWord$(data16)+" condition="+condition)
If condition Then
PC = data16
End If
End Method
Again, you can see that some lines are commented out: this is to speed up emulation. These lines could be uncommented to trace or debug emulation. Now there's some things that need to be explained with this method. First, by looking at the code, you can see that not only our interesting $c3 opcode is being handled here but a few related ones too. This will happen frequently since a lot of instructions have several opcode numbers associated with them. In fact, the instruction opcode not only tells the CPU what instruction to execute, but also what parameters or condition flag to use. In case of JMP here, you can see that in the end it simply sets PC. However, there are condition flags can be checked to see if the JMP should occur or not. JZ for exemple will jump only if the Zero Flag is True. Also of interest here is that the JMP instruction by itself doesn't tell where to jump: that's why we are calling FetchRomShort() at the beginning of the method: we are fetching a 16-bit address value, thus making PC advance an additionnal two bytes.
The keen of eye may notice an important note: in the instruction-set website, there are three jump opcodes that we are not handling here: JPO, JPE, and PCHL. PCHL is simply a different command so it should be taken care of by another method. However, JPO and JPE are ordinary conditonnal jump commands: I am omitting them intentionally because I haven't coded the Parity Flag emulation at all. Space Invaders does not use the parity flag, so any instruction regarding parity are not going to be handled by our emulator. This creates an important distinction here: We are coding a Space Invaders emulator, not a fully general purpose Intel 8080 emulator. Trying to run a different game using another ROM might not work because of that.
That being said, let's compile & run now to see what happens next!
-> 18D4: Undefined instruction: 31
Wow, we did really jump to a new address; look at our current PC: $18D4. The game's trying to execute opcode $31 and it is not handled by our select block. Oh well, that's enough data for today, off I go.
I you feel brave enough, you may try to implement instruction $31 by experimenting. In the next chapter, you'll see what it should be.