While giving implementing floating-point value clipping code in assembly a try, I ended up with code both smaller and more effective than that the compiler made from my old C macro. Using it for my rather simple Evil Limiter plugin forced the compiler to layout the surrounding code a bit differently, making it larger but faster; the two code-size changes cancelled out, and 40% of the CPU-usage (when running in "hard" mode, the least intensive) was sliced off.
And this value clipping code, being both tight and fast, will do just perfect for the new multi-format I/O system; once I have put it in, it will finally be done.