WIP: Reversi
Re: WIP: Reversi
hi,
thats like i do it.
the first transfer i made with screen on, in the map-demo version you have.
This make the errors on reset, i have change this.
ALL other i write ONLY in the VBI.
I know from TI and Memotech programmer that this must work, the memotech is the fastest Z80 machine with the tms chip.
Please try this in your game.
And DONT give up!
thats like i do it.
the first transfer i made with screen on, in the map-demo version you have.
This make the errors on reset, i have change this.
ALL other i write ONLY in the VBI.
I know from TI and Memotech programmer that this must work, the memotech is the fastest Z80 machine with the tms chip.
Please try this in your game.
And DONT give up!
Re: WIP: Reversi
I'm sorry to say that for every change I make, it works worse and worse. The version with multiple NOP's is so far the least buggy, compared to versions waiting for VBL, various flags, queueing calls and so on.
Obviously I'm NOT giving up, just that I will need more time to work on this and future Creativision releases. Perhaps it is my programming model that is wrong, that I would need to redesign the main loop in order to have it work properly.
If nothing else, it implies that the step from "working in the emulator" to "working on real hardware" may be bigger than one would expect, depending on how well you understand the VDP and how to program it.
Obviously I'm NOT giving up, just that I will need more time to work on this and future Creativision releases. Perhaps it is my programming model that is wrong, that I would need to redesign the main loop in order to have it work properly.
If nothing else, it implies that the step from "working in the emulator" to "working on real hardware" may be bigger than one would expect, depending on how well you understand the VDP and how to program it.
Re: WIP: Reversi
Yes, is a difficult step from emulator to real hardware. For the CV is more difficult because the fast CPU and RAM in combination with the TMS.
Most don`t understand how difficult it can be.
But we learn on every new machine, and we all wait your GREAT game.
So take your time.
Most don`t understand how difficult it can be.
But we learn on every new machine, and we all wait your GREAT game.
So take your time.
Re: WIP: Reversi
Is the 2 MHz 6502 really that much faster than a 3.58 MHz Z80, or is the VDP interfaced in a different way that makes accessing it faster through memory mapped I/O than via port I/O on the Z80?
Re: WIP: Reversi
Yes, 6502 at 1 Mhz are Z80 at 4 Mhz (also the A version)
6502 at 2 Mhz is like Z80 at 8 Mhz.
Yes, the Z80 has more register and more instructions but most take more cycles then on 6502.
The RAM Access is also most slower on Z80, and the 6502 need fast Ram.
But on all Computer or Console this errors can happen when you write to the TMS at the same time he is build the display.
I am not the expert here for this CPU compare stuff, but speak a lot the last days with TI-ler and some Memotech MTX programmer, they told me this.
6502 at 2 Mhz is like Z80 at 8 Mhz.
Yes, the Z80 has more register and more instructions but most take more cycles then on 6502.
The RAM Access is also most slower on Z80, and the 6502 need fast Ram.
But on all Computer or Console this errors can happen when you write to the TMS at the same time he is build the display.
I am not the expert here for this CPU compare stuff, but speak a lot the last days with TI-ler and some Memotech MTX programmer, they told me this.
Re: WIP: Reversi
if you look to the AppendexE from the TMS99xx pdf and read the examples for the TMS99xx cpu you see that here are also NOP`s and the comment is "dont write to fast" because this CPU is also very fast.
On Z80 machines this is not needed because the RAM access itself is so slow that the TMS is always ready.
But at then end best write only in the VBI.
On Z80 machines this is not needed because the RAM access itself is so slow that the TMS is always ready.
But at then end best write only in the VBI.
Re: WIP: Reversi
Some more information from an TI thread in AtariAge, maybe interesting. Here the same, only fast machines can overrun the VDP. Only access in VBI i save.
Here the text:
The VDP delay question, we had a huge investigation on the Yahoo group a few years ago. I'm the one who put a logic analyzer on the bus and measured instruction and turnaround time to the VDP. On a standard console, whether you are in scratchpad RAM or not, writes can not overrun the VDP because of the additional delay induced by the CPU's read-before-write cycle. Reads appear to be unable to overrun the VDP, unless you use the fastest possible instruction time (IIRC, it was MOVB R0,R1 -- note no indirection and no increment). However, there is a case where after setting the VDP address, and then doing a fast read instruction (such as one involving only registers), you MIGHT overrun the VDP. (For instance, write the address then immediately do a MOVB R1,<anything>, where R1 points to the VDP read data address). This is because the time between the write to the VDP for setting the address and the read from the VDP for getting the byte may be too short. Using an absolute address (MOVB @VDPRD,<anything>) seems to give you enough time with the extra memory read to be safe. As Mizapf notes, faster machines like the Geneve may need a delay (note the Geneve also has a different video chip), and accelerated TI consoles (faster crystal) may as well. These are relatively rare, however.
Emulators do not appear to care about the delay today. Classic99 does not at this time.
There ARE periods at which the delay is not needed. After VSYNC is the only predictable one (because there is no external indication of the other windows). The TMS9918 data manual notes that there are 4300uS of CPU access after a vertical blank starts -- some of this time is eaten by the TI interrupt routine if you leave it on. If the blank bit is enabled, so that the display is not being drawn, the CPU also gets full speed access to memory.
The reason for this is that CPU accesses to VRAM are given 'access windows' depending on the VDP's exact mode. There is always a 2uS delay, and then additional time is added depending on the current mode. You'll find the table on page 2-4 of the datasheet, but basically, in text mode you need 3.1uS, in bitmap or graphics mode you need 8uS, in multicolor you need 3.5uS, and when the display is off or during vertical blank, you need 2uS. Most 9900 instructions take far longer than these times, and so TI's recommendation was pretty much just a guarantee, as well as future-proofing for future, faster CPUs.
2. RMW - as Mizapf notes, there's no way to do it. If possible, it may be helpful to 'double buffer', keep a copy of a VDP memory area in CPU RAM, and then you only need to push, never read. I'm not sure that helps you for a line draw function, though.
3. Mentioned a few times, but yeah, SOC, SZC, ANDI and ORI are your best bet for bit manipulation. You can use a table of bits with SOC or SZC to remove the need for shifting
4. LI vs MOV is well covered. As an extra benefit, LI does not do a read-before-write when loading the register, one of the rare instructions that doesn't. When MOV can be considered for performance is when you replace @ONE with a register. Because memory performance is the biggest bottleneck on the TI, often the /length/ of your instruction is more important than the work it does.
5. Byte instructions on the 9900 are the same speed except when doing Workspace Register indirect autoincrement (*R1+) - a byte operation is 2 cycles faster.
Here the text:
The VDP delay question, we had a huge investigation on the Yahoo group a few years ago. I'm the one who put a logic analyzer on the bus and measured instruction and turnaround time to the VDP. On a standard console, whether you are in scratchpad RAM or not, writes can not overrun the VDP because of the additional delay induced by the CPU's read-before-write cycle. Reads appear to be unable to overrun the VDP, unless you use the fastest possible instruction time (IIRC, it was MOVB R0,R1 -- note no indirection and no increment). However, there is a case where after setting the VDP address, and then doing a fast read instruction (such as one involving only registers), you MIGHT overrun the VDP. (For instance, write the address then immediately do a MOVB R1,<anything>, where R1 points to the VDP read data address). This is because the time between the write to the VDP for setting the address and the read from the VDP for getting the byte may be too short. Using an absolute address (MOVB @VDPRD,<anything>) seems to give you enough time with the extra memory read to be safe. As Mizapf notes, faster machines like the Geneve may need a delay (note the Geneve also has a different video chip), and accelerated TI consoles (faster crystal) may as well. These are relatively rare, however.
Emulators do not appear to care about the delay today. Classic99 does not at this time.
There ARE periods at which the delay is not needed. After VSYNC is the only predictable one (because there is no external indication of the other windows). The TMS9918 data manual notes that there are 4300uS of CPU access after a vertical blank starts -- some of this time is eaten by the TI interrupt routine if you leave it on. If the blank bit is enabled, so that the display is not being drawn, the CPU also gets full speed access to memory.
The reason for this is that CPU accesses to VRAM are given 'access windows' depending on the VDP's exact mode. There is always a 2uS delay, and then additional time is added depending on the current mode. You'll find the table on page 2-4 of the datasheet, but basically, in text mode you need 3.1uS, in bitmap or graphics mode you need 8uS, in multicolor you need 3.5uS, and when the display is off or during vertical blank, you need 2uS. Most 9900 instructions take far longer than these times, and so TI's recommendation was pretty much just a guarantee, as well as future-proofing for future, faster CPUs.
2. RMW - as Mizapf notes, there's no way to do it. If possible, it may be helpful to 'double buffer', keep a copy of a VDP memory area in CPU RAM, and then you only need to push, never read. I'm not sure that helps you for a line draw function, though.
3. Mentioned a few times, but yeah, SOC, SZC, ANDI and ORI are your best bet for bit manipulation. You can use a table of bits with SOC or SZC to remove the need for shifting
4. LI vs MOV is well covered. As an extra benefit, LI does not do a read-before-write when loading the register, one of the rare instructions that doesn't. When MOV can be considered for performance is when you replace @ONE with a register. Because memory performance is the biggest bottleneck on the TI, often the /length/ of your instruction is more important than the work it does.
5. Byte instructions on the 9900 are the same speed except when doing Workspace Register indirect autoincrement (*R1+) - a byte operation is 2 cycles faster.
Re: WIP: Reversi
Yeah, I'll try to rewrite my game to wait for vertical blank. I'm not sure if a complex interrupt handler routine can be used while the main program busy waits for a vertical blank, or if all custom routines (e.g. music player) that I normally place in the interrupt handler must be moed to the main program somehow. Assuming an interrupt happens only and every time a vertical blank occurs, perhaps the interrupt handler can set some flag in CPU RAM that the main program waits for and then clears, i.e. reverse of what I tried earlier with queueing calls for the interrupt handler to execute.
Re: WIP: Reversi
Hi, i think you can let the music player stay in the interrupt. Is also great for the exact timing.
Re: WIP: Reversi
and don't forget that a part of the Bios Interrupt read the VDP Status register BEFORE jump to your Interrupt. So the Register will be reset.
But put the value in $0c.
[CARTRIDGE IRQ HANDLER]
FF3F: pha
FF40: txa
FF41: pha
FF42: tya
FF43: pha
FF44: lda $2001 ;VDP port #1r (reset INT flag,as need)
FF47: sta $0C
FF49: jsr $FF58
FF4C: jsr $FA00
FF4F: jmp ($BFEA) ; --> Your IRQ
But put the value in $0c.
[CARTRIDGE IRQ HANDLER]
FF3F: pha
FF40: txa
FF41: pha
FF42: tya
FF43: pha
FF44: lda $2001 ;VDP port #1r (reset INT flag,as need)
FF47: sta $0C
FF49: jsr $FF58
FF4C: jsr $FA00
FF4F: jmp ($BFEA) ; --> Your IRQ