An interesting idea occured to me last evening. I was thinking about how to split the engines logic into multiple threads to fit better into the modern multi-core environment. Of course I would stream audio data in separate threads as usual. But that’s not at all an equal workload divide. Where else are there opportunities of parallelism? The rendering optimizer (automated batching, culling, ..) maybe? Again, I don’t think it would pay off. So I thought about what the video card’s secret of parallelism might be: Data independency.

When trying to modify a texture via pixel shader – such as when doing postprocessing – the video card won’t let you read and write from / to the same buffer. When each pixel may depend on each other, there needs to be a defined order in which pixels are processed to achieve deterministic behaviour. The video card couldn’t just process two pixels at the same time because each pixel might relate to the other pixels data. By targetting read and write operations to the same buffer, we’d basically screw the ability to process all data at once. So, what we do when doing texture postprocessing is the ping-pong method: Read from texture A and write to texture B. Then, switch textures. Doublebuffering, that is.

Sounds familiar: Isn’t that the problem we run into when trying to parallelize updating our GameObject data? Each GameObject might use data of each other GameObject and we’re reading and writing to the same set of data here. To make this even possible we’d need to lock each currently updating GameObject; which might lead to deadlocks and a lot of thread stall. Not good at all. But here comes the “pixel shader” solution: Why not doublebuffer our GameObject data? Imagine each GameObject not storing one set of data, but two of them. The first one is last frame’s data, the second one current frame’s. Any update method will only use each objects last frame’s data set as a read-only source and store any new data in next frame’s data set as a write-only target. That way, we’d basically have a static source data buffer (“texture A”) and a write-only target data buffer (“texture B”). We decoupled all GameObjects from each other and are now able to process them all at once – in as many threads as we want.

I haven’t tested it yet and won’t implement this for GameObjects or Components as it would complicate things for the end-user who will script a lot of custom Components. Also I really doubt it will pay off for the relatively low amount of GameObjects there will be in a typical 2D game. However, it might be an idea worth keeping in mind for future consideration.