Thursday, October 28, 2010

Performance Work

So I decided to take a peek at performance lately. Although it was not bad (it was still capping out at 60 fps), I was still curious to see how I was spending my time. So I hooked up some timers and it turned out my occlusion tests were taking up around 50% of frame time (the other 50% roughly going to rendering - but that would include time waiting for vsync)! Some simple early returns using spherical tests brought it back down to 10% of frame time - a simple fix for an unexpected problem!

On top of this, I added some face merging/simplification code for my sprite-to-mesh generator. Again, not necessary - it was just for fun. Previously, I was generating vertices for each pixel from the source image, and using vertex colours to colour them. Ugly, I know, but dead simple and it worked.

Anyway, after a week of banging my head against a wall coming up with a working algorithm, I finally have results! It uses a fairly simple heuristic: try form the biggest face along the local horizontal (U-axis) and with that merged face, try form the biggest face along the local vertical (V-axis). It will not always find the most optimal result, but it is good enough and still a massive improvement from before.

This new mesh structure also means I cannot colour with vertex colours anymore. So colouring is now done through texture lookups of the original, source image. The texture is mapped in a following pass, once the simplified mesh has been generated.

Check out the images below. Happiness!

I do not have any hard performance numbers to show though. I think these changes would only be evident in GPU performance (not CPU), and I actually don't have any tools installed to measure that. I will hunt for some in the near future, but any recommendations in the meantime would be highly appreciated!

UPDATE: I realised this a few days after and thought I should note it here in case anyone tries it. The face merging algorithm described is bad. It creates T-junctions, which will produce rendering artifacts. The approach can be modified to correct this, but it is low on my priorities (I am not vertex-bound at the moment).

Before - no simplification

After - with simplification!