An implementation of the Prospero challenge in C# (along with a little interactive visualization)
Go to file
profan 4d8d2bf1f0 another note 2026-06-07 19:05:20 +01:00
.idea/.idea.Sharpero/.idea implement basic constant folding, eliminating around 17 % of the instructions from the hot loop and folding it into the instructions 2026-05-31 17:19:15 +01:00
Programs some cleanup 2026-06-06 22:02:02 +01:00
.gitignore hmm 2026-05-25 17:51:38 +01:00
Program.cs add a little readme, factor things properly 2026-06-07 18:32:43 +01:00
README.md another note 2026-06-07 19:05:20 +01:00
Sharpero.csproj dependency 2026-06-07 18:49:01 +01:00
Sharpero.sln iNITIal commit, simd accelerated interpreter 2026-05-25 11:38:28 +01:00
Sharpero.sln.DotSettings.user implement optional compilation of inner loop to CIL, unrolling the loop into a number of functions we invoke 2026-06-06 20:49:12 +01:00
global.json iNITIal commit, simd accelerated interpreter 2026-05-25 11:38:28 +01:00
sharpero.png add image of the application to the readme 2026-06-07 18:41:55 +01:00

README.md

Sharpero

This is an implementation of an Interpreter/Compiler for the Prospero challenge, implementing some basic optimizations including removing constants from the instruction tape, vectorization, parallelization but also optionally compiling the loop of the evaluator to CIL prior to invocation.

It doesn't perform any sophisticated interval-arithmetic based optimizations (... yet), it's a brute-force approach.

This program is also an interactive visualizer of the rendering process, allowing you to see exactly how it's writing the image data out, and toggle vectorization, parallel execution, compilation on/off and observe the effects on runtime.

Interesting Things

Compilation to CIL

When attempting to compile the inner loop (flattening the instructions into a big C# function) I ran into limits where making functions too large made the JIT quite unhappy (either because it physically refuses to make functions with more than 65 kilobytes of CIL), or because it simply performed quite poorly when jitting large functions, so I ended up with an experimentally derived "max instructions per chunk" which ends up splitting the generated inner loop into a number of subfunctions, as many subfunctions as are needed, and the final program ends up being something like:

void EvaluateLoop()
{
    EvaluateChunk1();
    EvaluateChunk2();
    EvaluateChunk3();
    // ... etc, with the current program and chunk size, this ends up being about 200 of these subprograms being generated
}

RyuJIT Implementation Details

When the C# JIT observes a pattern like this:


[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static T Add<T>(T a, T b)
    where T : unmanaged
{
    if (typeof(T) == typeof(float))
    {
        return Unsafe.BitCast<float, T>(Unsafe.As<T, float>(ref a) + Unsafe.As<T, float>(ref b));
    }
    else if (typeof(T) == typeof(Vector<float>))
    {
        return Unsafe.BitCast<Vector<float>, T>(Unsafe.As<T, Vector<float>>(ref a) + Unsafe.As<T, Vector<float>>(ref b));
    }
    else
    {
        throw new InvalidOperationException();
    }
}

... which to the normal C# eye would seem like you'd end up with a runtime branch every time this runs right? However because C# JIT implementations (RyuJIT included) monomorphizes generics when T is a value type, we end up with different versions of this function, and whenever it compiles one of these functions where it branches over types and typeof(T) along with branching over types is a specific pattern the JIT recognises, we actually end up with a function without any branches after the JIT has compiled the function for us. Fun!

... Good for us, because I abused this pattern a lot in this program in order to make it easy to toggle on/off vectorization!

The Program

The Application, in its 1024x1024 window

How do I run it?

You'll need .NET 10, with dotnet run -c Release that should be all you need.

The only dependencies are RayLib-Cs (for the interactivity) and SkiaSharp (for writing out the image).

(Crude) Benchmark Results

On my own machine (CPU: Ryzen 7 4800HS), the results tabulate roughly as follows.

Compilation Parallelism Vectorization Evaluation Time Compilation Time
enabled enabled enabled 0.2s 0.2s
enabled enabled disabled 1.3s 0.2s
enabled disabled enabled 1.7s 0.2s
enabled disabled disabled 10s 0.2s
disabled enabled enabled 0.7s N/A
disabled enabled disabled 5.0s N/A
disabled disabled disabled 48s N/A

Recommendations

Probably don't use this with some kind of agentic LLM workflow, bad things might happen :)

Don't say I didn't warn you.

License

MIT/X11