The symptons were as following: the second part of the MQM 3 - Total Brainstorm, from the MQM-Team, failed to "decrunch" and caused a reset on my machine (ZX-UNO).
I first tried to contact any of the MQM-Team members and so I posted a request. Meanwhile, I thought that it could be handy if I can give some clues to whoever might show up as member of the so called MQM-Team. So I started debugging the code.
First I tried to narrow the moment in which the bug manifested. Luckily, I can save and load snapshots in the ZX-UNO thanks to DivMMC and ESXDOS. I used also Spectaculator and SpecEmu. Using the integrated debugger of these emulators, I made some snapshots at different points of execution. All of them worked in the ZX-UNO.
Then I was earlier in the code each time until I found one point in which the snapshot failed to execute in the ZX-UNO. Some instructions later, the snapshot worked fine. That leaded me to the following code fragment:
Código: Seleccionar todo
:61df ed 7e im 2 ;start of 1st snapshot: fails under ZXUNO :61e1 21 ff ff ld hl,0xffff :61e4 36 c9 ld (hl),0xc9 :61e6 f9 ld sp,hl :61e7 3e 3b ld a,0x3b :61e9 ed 47 ld i,a :61eb 3e 38 ld a,0x38 :61ed 06 08 ld b,0x08 :61ef d9 exx :61f0 21 00 58 ld hl,0x5800 :61f3 11 01 58 ld de,0x5801 :61f6 01 ff 02 ld bc,0x02ff :61f9 77 ld (hl),a :61fa ed b0 ldir :61fc d6 08 sub 0x08 :61fe d9 exx :61ff fb ei :6200 76 halt :6201 fb ei :6202 76 halt :6203 fb ei :6204 76 halt :6205 10 e8 djnz 0x61ef :6207 cd 52 00 call 0x0052 ;start of the second snapshot: success under ZXUNO :620a 3b dec sp :620b 3b dec sp :620c e1 pop hl
The code starts with interrupts disabled. Interrupt mode 2 is selected, and a quick interrupt handler is installed by changing I to $3B, so the interrupt handler address is stored at address $3BFF. This address belongs to the ROM, in an area filled with FF bytes, so the final execution address for the interrupt handler is $FFFF. The code writes $C9 in this location. $C9 is the opcode or RET, so the interrupt handler is merely a RET instruction. When an interrupt handler is executed, the CPU clears the interrupt flag, so after RETurning, interrupts are disabled. This is why just before a HALT there is an EI instruction, to make sure that HALT will be executed with interrupts enabled.
The code from 61EB to 6205 is just an attribute fadeout routine, that cycles from white, yellow, cyan, etc, to black, changing the paper value of all attribute cells (with LDIR). The sequence of three HALTs allows the visual effect to be synchronized with the vertical blanking period, so there's no flickering. Note that we exit the fadeout loop with interrupts disabled.
A set of changes were applied to this snapshot to see what happened with each one (applied to the original snapshot, not cumulative changes):
1. Changed LDIR with NOPs. Snapshot fails.
2. Patched the snapshot so just after B is loaded with 8 (address 61ED), an inmediate jump is performed to address 6207. Snapshot works.
3. Patched the three EI HALT sequences with NOPs. Snapshot works.
4. Patched two of the three EI HALT sequences with NOPs. Snapshot fails.
Suspecting that the bug may be related with the length of the INT pulse (being too long and therefore retrigerring the interrupt handler when it's not supposed to do), I changed the $C9 value written to $FFFF to $18, the opcode for JR . The displacement for this instruction is taken from address $0000 at ROM, which is $F3, so this effectively jumps backwards into the code a dozen bytes or so. I put some NOPs and a RET. Now the interrupt handler spends a bit more T-states and won't be retriggered. Guess what? It didn't work.
The EI HALT issue was baffling. It failed, but still no idea why. I wanted to know what changed in memory in the ZXUNO versus the emulator just after the first part of this code has been executed (that is, when we are about to execute the instruction at address $6207). To get that, I needed to make a snapshot at the right moment. This is not a problem in an emulator, in which you can just put a breakpoint, wait it to trigger and then save the snapshot. For the ZXUNO, I patched the code beginning at $6203 with NOP , CALL $0066. That erases the third EI HALT sequence, and the DJNZ instruction, so after returning from the CALL, we are at $6203.
Calling $0066 is like pressing the NMI button on the DivMMC. ESXDOS shows up and I can save a snapshot of the current running program. I know that some values, like register R, won't be the same. Also, value for B register won't be the same also, but I wasn't interested in register values, but memory contents.
Comparing the snapshot taken with the emulator at $6203 with the one taken by ESXDOS, also at $6203 revealed me some things:
- Memory was not exactly the same. Leaving the attribute area, which of course it was different, there was a couple of changed bytes here and there, and at the end of the RAM, some others.
- I remember having swapped the RAM from one snapshot to the other and getting no conclusive results (sorry, this part of the analysis was not written down). The thing is that I discarded RAM contents to be the cause of the failure. Then I focused on the registers contents. The 27 byte header of a SNA file holds the contents of all CPU registers, interrupt state (disabled/enabled), interrupt mode, border color, etc. I knew some registers would have different values, but then I saw it...
... the snapshot from the emulator (the "good" one) stored "IM 2" as the current interrupt mode, but the snapshot from ESXDOS stored "IM 1" !!!!!!
How is that possible? The very first instruction executed is precisely IM 2.
Then I noticed the opcodes used for IM 2: ED 7E. I know some isntructions, specially if they need an ED prefix, can be decoded with more than one opcode. Is ED 7E the official opcode for IM 2???
It's ED 5E
According to http://clrhome.org/table/ , IM 2 can be decoded with either ED 5E or ED 7E . I quickly get into the very poorly documented (if any) VHDL code of the T80 core (T80_MCode.vdh) and found this, for the decoding of the IM 2 instruction:
Código: Seleccionar todo
when "01011110"|"01110111" => -- IM 2 IMode <= "10";
I inmediately changed that into:
Código: Seleccionar todo
when "01011110"|"01111110" => -- IM 2 IMode <= "10";
After this, I've tracked down all versions of the ZXUNO Spectrum core to see that nearly all of them use a T80 core with the same bug. Only one of them used a slighly different version, in which the IM 2 decoding was bugfree. In fact, I remember having run this MQM3 demo with no problems in the past.
BTW: the T80 used in the TBBlue core has the same bug. Luckily, the opcode $7E has not been taken for one of the new instructions, so it's very easy to fix