Modding, Mission Design, and Coding > Cross-Platform Development

Using SSE based data structures

<< < (2/3) > >>

Phantom Hoover:
Incidentally I found this article by some German who seems to know what he's talking about giving reasons that exactly this sort of 'optimisation' doesn't work. Basically it comes down to the boring truth that modern CPUs can already do floating-point adds so quickly that they're negligible next to memory access.

AdmiralRalwood:
Perhaps the most important point is that making all vectors take up 25% more memory means fewer of them can fit into the cache at one time; I ran into a similar problem when I experimented with replacing all maps in the source with dense_hash_maps. My framerate stayed pretty much the same, probably because the increased memory usage wound up cancelling out the improved performance because more cache misses were being generated.

Phantom Hoover:
I kind of want to grep the FSO assembly output now to see if any vectorised instructions are even being generated, or if the main benefit of SSE is just not having to use the godawfully slow x87 operations.

PeterMitsis:
Thanks for all the feedback thus far.

For those who were wondering what code I had tried thus far ... https://github.com/PeterMitsis/fs2open.github.com/tree/x86-sse-experiment 

I've tried attaching two screen showing the misshapen ships (from the first training mission in FS2) to give a better idea of what it looks like. Hopefully they show up.

@Krobar ... I had a feeling that (at least one of) the remaining problems may be due to a packed buffer structure similar to that which you described. Any pointers on which modules/routines/structures I should start reading up on to get a better idea of how this is done in the SCP?

(I don't know if anything will come of this experimenting.  At the moment, it is just a little bit of dabbling here and there for proof-of-concept.)

[attachment deleted by admin]

PeterMitsis:
@Phantom Hoover - Regarding the generated assembly output, I can only speak for what I am seeing on my machine (Ubuntu 16.04, g++ 5.4) and for that, the  "godawfully slow x87 operations" as you put it are in the current codebase merely replaced with equivalent SSE scalar instructions (such as ADDSS, MULSS, SUBSS, ...).  I suspect the case is the same for the other compilers, but I have not verified that.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version