softglow's notebook

Dispatches from the Depths of a Super Nintendo

New Year, New Direction

Well, it’s 2014. The demo at work went well, the full project is approved, there were many celebrations, and a number of small bugs in legacy projects got fixed up. That closes out 2013 and brings me back to normal at work.

The time away from the low-level “this is the next thing along this path” coding at home gave me some time to think about whether that was the right path, or even an effective path, and I decided that it really wasn’t.

Repo Shakeup

To blow off steam between the demo and end of the year, instead of coming back to music, I wrote my first-ever hack: Item-passable blocks which act as air when Samus has a specific piece of equipment enabled.

Then, I deleted the alien-spc repo. The idea of one great GUI that allowed for awesome editing (and “random seeking” via emulating from zero-to-playhead in a background thread) would have been technically awesome, but after spending over a month just getting the capability to pass audio between threads—-without any actual emulation happening—-it was clear that this was the long, slow way around.

In place of that, there’s midi-spc-kit. My vision for this is a pair of Python scripts (sharing some common library code in midi_spc), one to handle copying from a MIDI to an SPC image, and one to handle copying changes made in the SPC image back into the game ROM.

MIDI for tracker music?

I actually plan to parse the MIDI event stream and automatically find the pattern/track/song loop points, to optimize the SPC output. Size is going to matter.

And with that… Happy 2014! Let’s see if I finish anything music related this year!

On Break

There’s an end-of-year rush on at work, involving a lot of hardcore programming leading up to a demo day, so I don’t have the bandwidth left to work hard at home until about Christmas. Further progress and updates aren’t expected before then.

DSP Emulators

I was reading around byuu’s forum (technically, preparing to make sure my question regarding the accuracy of byuu’s SMP/DSP vs. blargg’s was in the right sub-forum) when I came upon the answer to my question.

To summarize into a a perfect viral posting for social sites…

TIL: 3 Amazing Facts About SNES Audio Emulators

  1. blargg’s DSP is 100% pure awesome.
  2. byuu’s SMP is better than blargg’s, though.
  3. byuu’s main SMP only has style points over the alt SMP core. E.g. it emulates the TEST register which nobody uses.

So was I wasting my time?

Partly. I think extracting sfc/alt/{smp,dsp} is the way to go, which makes the deep dive on the main cores mostly irrelevant, but everything I needed to know to extract them, I still need to know to extract the alt cores.

Maybe I’ve Been Wasting My Time?

In higan’s sfc/alt/dsp directory is the alternate SMP emulation core, used when the accuracy profile isn’t selected. It apparently depends on a couple of Blargg’s audio libraries which includes… “snes_spc-0.9.0”. Of which blargg boasts:

The accurate DSP passes over a hundred strenuous timing and behavior validation tests that were also run on the SNES. As far as I know, it’s the first DSP emulator with cycle accuracy, properly emulating every DSP register and memory access at the exact SPC cycle it occurs at…

And of course, this is already written as a library, meant to be used in another project. No games required.

My only question is: if snes_spc is that accurate, why does higan not use it in accuracy mode for its own accuracy profile? Is byuu’s DSP somehow more accurate? (I suspect so. It’s not like snes_spc is a fast-moving target, as its first public release is also, AFAICT, its only one.) I don’t want to commit one way or the other without an answer…

Where the Samples Are

Looking at the higan v093 source, it’s fairly obvious where the DSP code lives, but how does the sound escape it?

Sending samples to the Interface

In the performance accuracy profile, the process actually begins down in echo_27(), implemented in sfc/dsp/echo.cpp. It is there that the elusive audio.sample() method is called, which is part of Audio, over in sfc/system/audio.cpp. In the case where there’s no coprocessor, it passes directly into interface->audioSample.

In the complicated case where there is a coprocessor, then the audio heads into a DSP buffer, to be averaged in audio.flush() with the coprocessor’s buffer; the final samples are clamped to 16 bits and finally routed through the same interface->audioSample there.

Interface

Sharp eyes will have noticed the arrow operator, indicating that interface is a pointer. It’s a pointer to the binding of the current interface, which is to say: it’s a Interface *interface where the type is struct Interface : Emulator::Interface::Bind.

Looking from the outside in, target-ethos/bootstrap.cpp creates the binding-interface, creates and appends all the known system emulators to a vector, then iterates over the vector to set each system emulator’s binding to the single global binding that it just created.

Although audioSample is virtual, I haven’t found anything yet which actually overrides it.

audioSample & dspaudio

The implementation of audioSample in target-ethos/interface/interface.cpp does very little:

signed samples[] = {lsample, rsample};
dspaudio.sample(samples);
while(dspaudio.pending()) {
    dspaudio.read(samples);
    audio.sample(samples[0], samples[1]);
}

The dspaudio is a nall::DSP declared in target-ethos/ethos.cpp and implemented in nall/dsp/core.hpp. Pretty much, samples go into the DSP, they get resampled to the output sample rate, and then those results can be read out again.

The trick here is that the audio visible in this particular scope is actually AudioInterface audio from ruby/ruby.cpp. Platform audio drivers live under ruby/audio/*.cpp and AudioInterface::sample passes the samples along to whatever driver happens to be connected.

tl;dr

If the madness hasn’t taken me:

  1. The system DSP produces samples and passes it into the system audio chain.
  2. The system audio chain passes samples into the platform interface.
  3. The platform interface pumps samples through the resampler.
  4. The resulting samples are pushed into the platform audio driver.

Solved: The Mystery of Privilege

Just a quick note, since privileged is actually not a C++ keyword, the thought finally crossed my mind: what if it’s a macro?

One quick grep later, there was one match in emulator/emulator.hpp line 72-76 (for anyone who hasn’t read the whole blog, this is the higan v093 source I’m talking about):

#if defined(DEBUGGER)
    #define privileged public
#else
    #define privileged private
#endif

Yeah, it looks like a good day to be humble about my programming ability. It only took me, like, a month to come up with that idea.

Brief Intermission

No progress is expected this week. I’m taking the week off to catch up on some other things around the house.

I’m disappointed, actually. I wanted to have higan connected by the end of October, not just a stupid triangle wave generator. (Which is the easiest waveform to program—since dy/dt = C, all you need is a fixed delta that gets its sign inverted when reaching the edge of the range.)

SndThread Lives

I wrote that I had the thread buffer handoff working correctly. Hah. Ha! HAHAHAHAHAHA!

Now it works. I got a thin, quiet, reedy whine out of it last night, and today, I tracked down the final bug: the IO formatter was shifting the wrong direction, thus turning a sample like 3560 into output bytes 6000 because 350000 & ff (the implicit conversion to byte) is zero.

This started life as a triangle wave, but when I got a bad sound, I tried ‘flattening’ the peaks out to put in more sound power and make it fuller. This did not work. Not only was sound power unimportant, but instead of holding the peak value, it holds zero.

But, since this bug is clearly in the generator and the rest of the output chain is working to suit me, I’m going to leave it unfixed and move on to connecting higan’s SMP/DSP subsystem in place of my broken generator.

Progress Report

I got the buffer passing code that I described on Sunday running. It’s important that if you’re not going to call QIODevice::write to put data into the device—because, say, you’re swapping in a buffer and converting it at read() time because, um, you’re just doing it wrong—that you emit the readyRead and bytesWritten signals yourself when you swap in a buffer. Otherwise, the audio output never gets signaled that the device is readable, and stops output after a timeout.

</tangent>

It turns out that the tone generator I’ve connected for testing (to stand in for higan) is broken. Completely busted. It produces an infinite number of repetitions of:

0
0
-32768
-32768

That is not a triangle wave. And since no horrible howl comes out of my speakers, either the audio output stage is also broken, or something is detecting “this makes no sense” and sparing my ears.

Oh well. If I’m lucky enough, maybe I’ll get to the point where I have an SPC player using higan’s audio engine by 2014.

On QAudioOutput

Most QT resources seem to want me to use higher-level classes like QMediaPlayer or QSoundEffect to point at extant files or possibly a QAudioBuffer which is already filled with a QByteArray. There’s really not much out there about synthesizing audio and playing it straight through a QAudioDevice.

The latter may operate in push mode or pull mode; the docs will tell you as much. What may be less obvious is how those modes are chosen. It’s simply your choice of overloaded method:

QIODevice QAudioOutput::start(); // push mode
void QAudioOutput::start(QIODevice *dev); // pull mode

Looking over the pre-made classes that implement QIODevice, I notice that they love to tell you when data is ready to read but not when data may be written. And in both modes, the QAudioOutput is doing the reading and your application needs to know, somehow, when it should write.

So, I implemented my own QIODevice around my own buffer type. The class actually holds a pair of buffers, one being written by the generator thread and one being read by QAudioOutput. When both are finished, the buffers are swapped. Whenever the output calls read, I format data from the app’s buffer into bytes, directly into the output pointer.

And actually, nobody calls write. The generator can be connected directly to the app buffer, which provides its own append() method that consumes (potentially many) snd_sample_ts.

There are a pair of signals and slots each side uses to communicate about their shared buffer:

class SndIO : public QIODevice {
    Q_OBJECT
    ...
public slots:
    void writeComplete(SndBuf *buf);
signals:
    void readyWrite(SndBuf *buf);
    void emptied();
    void underrun();
}
class SndThread : public QObject {
    Q_OBJECT
    ...
public slots:
    void start(SndBuf *buf);
signals:
    void finished(SndBuf *buf);
}

The main thread then connects SndIO::readyWrite to SndThread::start and SndThread::finished to SndIO::writeComplete. The SndBuf itself isn’t used simultaneously by multiple threads, because the IO won’t touch it until the Thread has signaled that it has finished filling it.

This produces a four-state system, starting with “both buffers empty” before the main thread wires the objects and initiates the first fill on the IO.

1. empty/empty
2. empty/filling
3. draining/filling
4. draining/full

When state 2 emits finished, state 3 begins; emission of emptied there returns to state 2, otherwise, finished happens again and state 4 is entered. From state 4, the only move is back to state 3 when the emptied signal is emitted. After the initialization period (once state 3 has been entered for the first time), it’s possible for state 2 to receive another read from the QAudioOutput; this emits the underrun signal without changing state.

This is a actually a slightly-more-complex version of the producer/consumer problem. Instead of a single producer and consumer directly communicating, there is an intervening QIODevice carrying data across threads, consuming the generator’s output, then marshaling and producing it for the output to consume.

I’m also effectively using signals and slots to let the event loop block the producer when it gets too far ahead. When that happens, no “produce more” signal is delivered until the output can catch up.

All that said, this setup compiles but I haven’t finished wrapping a test program around it yet.