2017/04/04

Development Insight #4: SNES Bad Apple!!

For those unaware, Bad Apple!! is the name of a song from Lotus Land Story (Touhou Gensoukyou). More famously, it's the name of a video featuring several Touhou girls in silhouette, alongside a remix of the original song (with vocals by "Nomico"), first posted onto NicoNicoDouga years ago. More relevantly, it's become an unwritten challenge to port that video to any and every computer, display, game system, processor, and architecture. And that's where we'll start.




Normally I do things for fame and fortune. Though ASM doesn't bring any fortune, and my fame dies out rather quickly... So once in a while, I'll do something simply to test my skills. I discovered that people have been porting Bad Apple to various stuff, and even saw a SNES version. I decided to try it myself, just to see if I can. I had worked with MSU video before, but this would obviously be more challenging. For starters, I have (well, had now) zero personal experience with the SPC, though I did possess a sort of "base" SPC program that wiiqwertyuiop made.

Of note is that I released 2 versions: one during a summer C3 (too lazy to look up which), and one during a random day some time afterwards. The first one had severe audio problems, because I had started the project a week or two before C3 and didn't have time to fix the audio. The second version fixed the problems with the first, and was pretty much perfect, aside from the fact it was not released during C3 and thus received far less attention. I'll explain the audio issues later. But enough talk; on to the actual homebrew.

The final result was a 2BPP video @ 15fps and 32kHz mono audio packed into a 7.5MB ExHiROM. Let me explain these choices.

The SNES really, really sucks at FMV playback. NMI is too short, and ROM is too small. The NMI problem can be solved by using PAL instead of NTSC, but PAL has a weird refresh rate and screen resolution, and is associated with Yurop, so I stuck with NTSC. The ROM problem can be solved by using either MSU-1 or the SNES CD prototype that was unearthed. But then it becomes really simple. This is a challenge, after all. It should be done on the bare metal; no enhancement chips. Except we need a larger ROM either way. So I went with an ExHiROM format. Which technically allows for 8MB, but in reality only 7.5MB is accessible. That's because the first half of banks $70-$7F (a total of 512MB) is completely inaccessible, due to S/RAM being mapped there. Luckily, the second half can be accessed through banks $30-3F.

The original video is black-and-white. Except not really; there are grays throughout (shadows, transitions, antialiasing, etc). 24-bit color can only display 256 shades of gray (including black and white)(inb4 50). The SNES can only display 32 shades of gray (256/8). That means using an 8BPP mode would be idiotic. So we can either go with 4BPP or 2BPP. Since the grays are much less prominent than black and white, I believed it wasn't necessary to display half of the entire grayscale palette. Black, white, and 2 grays should be OK. Hence 2BPP. We also save NMI time this way. No dithering though, because that complicates decompression, which I'll cover now.

Obviously, 2BPP alone isn't enough to fit a 3 minute video in 4MB (I was reserving 4MB for audio). The frames need to be compressed. So after much trial and error, and toying with different compression methods (including an attempt at buffering), I cheaped out and went with the quickest to implement, and also the first one I tested: LZ2, decompressed on-the-fly. The LZ2 decompression routine used is the optimized version included in Lunar Magic (taken from the site, not from LM itself). It's fast, but some scenes in the video are just too complex for it, like the Youmu-Yuyuko scene with the sakura. Actually, most of the video frames are just too much to decode within 2 frames (30fps). So even if most frames don't exceed 3 frames, in the end I decided a hard limit of 15fps would be easiest. With buffering and rearranging of the tilemap, 30fps can probably be achieved with LZ2. But that's something I'll try another time.

The video was a cakewalk compared to the audio. While video compression can be variable, the BRR format for SNES audio is hardcoded to a 32:9 compression ratio (A single 16-bit raw sample consists of 16 words, or 32 bytes. Each word is cut to 4 bits, and a header byte is added, totalling 9 bytes). This meant that I was much more limited in terms of what I can fit into the ROM. The 4MB I reserved for audio is enough to fit around 4 minutes of 32kHz audio. In terms of quality, 32kHz audio is the maximum the SPC can do. There's a gaussian filter the SPC always applies that can apparently be bypassed somehow using the echo buffer, but this was my first foray into the SPC and I'm not that crazy.

Once the quality and etc were settled, now came the daunting task of figuring out how to stream the audio to the SPC. Normally, SPC transfers are cumbersome. The SNES tells the SPC it wants to send something, the SPC tells the SNES its ready, the SNES sends a byte, the SPC receives the byte, the SPC tells the SNES it got it, the SNES tells the SPC cool, repeat for the next 4095 bytes you want to send. This is why games only changed music during black screens or boring areas. But if we took shortcuts, we can speed up the process. By shortcuts, I mean make assumptions. Assume the SPC is always ready to receive. Assume the SNES is sending data properly. That cuts the time down significantly. But the SPC and SNES are running on different clocks. So the SNES will need to wait for the SPC to process its request, which means you lose time either way. A lot of time, since there's still the overhead from the opcodes. Luckily, there's a solution.

I've heard of people (or games) using HDMA to stream audio. How do they do that? Well, that's something I had to figure out. Basically, send a few bytes to the SPC ports at the end of each scanline. While the SNES does its shenanigans during the next scanline (like decompressing video), the SPC will be processing those bytes. The SPC will then finish and wait for the next transfer. This needs to be well-timed (SPC-side), or else a transfer will go bad and you get glitchy audio. That's just half of the problem though. Once the audio is in the SPC... how do you play it???? I'm serious, this problem plagued me for days. Short samples are easy, but a 3-minute one? I considered and attempted various methods, namely having two different instruments and switching between them. But I always got crackling, due to the SPC not liking it when you switch between instruments too quickly. Either way, this was the method I used in the first version (hence the terrible quality). And I left the project at that, not touching it for months.

I then had an epiphany. Everything in ARAM is writeable whenever. Including a sample's loop pointer. So my idea was: have an infinite instrument play (infinite sustain, no release). The instrument is tied to a looping sample. Upload the streaming audio to a certain part of ARAM, and when it's nicely buffered, change the sample's loop pointer to point to this part of ARAM, and upload to a different part of ARAM. Note the buffering. Even if the transfer is well-timed, the SPC (or the DSP, or both) just isn't perfectly clocked. The DSP will eventually reach the end of a sample before you're able to change the loop pointer. You can only delay this event (to my knowledge), and that's by buffering the data received. The bigger the buffer, the later this DSP-catch-up occurs. Though remember ARAM is limited. I buffered enough so that the catch-up happens after the video/audio ends (~3min).

As for the audio being mono... well, stereo would require double the space (and double the time). So it was either 16kHz of stereo or 32kHz of mono. The choice was easy. Also, the WAV->BRR converter I used only allowed mono.


And that's how I did Bad Apple. Not very noteworthy in the grand scheme of things, but definitely the greatest project I've done. I learned a lot through it, especially about the SPC. It brought neither fame nor fortune, but either way I am happy I did it. Video sample, and download (including source) are here: https://www.youtube.com/watch?v=WdWXUdP4zaI

No comments:

Post a Comment