Author Topic: Using SSE based data structures  (Read 3940 times)

0 Members and 1 Guest are viewing this topic.

Using SSE based data structures
Platform - Ubuntu 16.04
Target - x86-64

Although the current x86 builds for Linux and Windows utilize the SSE instructions, the math is done on a single element basis instead of on the vector as a whole.  I am currently playing around on an Ubuntu 16.04 (64-bit) machine to see if I can upgrade the vector/matrix  code to use the SSE vector instructions (where applicable).

To that end, I have first updated (no SSE math yet) ...

1. 'vec3d' to use four elements for easy mapping to the SSE xmm registers (although the fourth element is currently unused).
2. 'matrix' to use the new four-element 'vec3d' structures
3. Various places that use the 'a1d' interface to 'vec3d' or 'matrix' variables
4. Various places that use the 'a2d' inteface to the 'matrix' variables
5. Checked for hard coded sizes of 'vec3d' and 'matrix'.
6. Checked for use of 'sizeof(vec3d)' and 'sizeof(matrix)'.

The good news is that everything builds, and the game appears to run (using an original installation of FS2). The bad news is that the ships are misshapen. Obviously I have missed something.  But what?  My guess is that something, somewhere (rendering code? ship models?) is expecting a 'vec3d' to be 12 bytes (3 floats) and/or a 'matrix' to be 36 bytes (9 floats) but I do not know what/where. If anyone has any suggestions, they would be greatly appreciated.

 

Offline The E

  • He's Ebeneezer Goode
  • Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Using SSE based data structures
A few suggestions here: One, don't try to roll your own vector math library. I think it would be preferable if a library like glm is used, because it already has a lot of the SSE/AVX stuff built in. Two, if you change the vector format, then every point where vector data is fed to opengl needs to be checked and adjusted too.
If I'm just aching this can't go on
I came from chasing dreams to feel alone
There must be changes, miss to feel strong
I really need lifе to touch me
--Evergrey, Where August Mourns

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
Re: Using SSE based data structures
Holy **** I want screenshots of these ****ed up ships.

 
Re: Using SSE based data structures
Seconded. Get that **** on github!

(Probably this is something that could be controllably faked with shaders if you want crazy ****ed up Nagari ships.)
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline Kobrar44

  • On Suspended Sentence
  • 29
  • Let me tilerape it for you!
    • Steam
Re: Using SSE based data structures
This may differ, but I think I know what happens. I tried that, so I'll share my experience and conclusions.

My result was caused by some a2d matrix reference, once I fixed that, it went away.
There is, however, still one more thing to do. Al least one.
The models are loaded into a buffer. Straightup. Byte after byte, pof model files are copied over. This is really bad actually. Why?
Well, to have models we need vertices and to have vertices we need vectors. in POF the vectors are I believe setup like this: xyzxyzxyzxyz etc. FSO knows where in the file is the first vector because it knows the file structure, and knows how many vectors there are, so it just saves all the pointers to the vectors from the buffer in an array. So this would be the first place where things **** up.  Right now it saves pointer to xyz then increments the pointer so it points at next xyz etc etc etc. With 16 byte vector, you save pointer to xyzx and then yzxy. I figure I'd need to either rewrite the thing to store vectors neatly, which would be slower, or somewhat reparse the file adding w component to all vectors for compatibility with the rest of the code in place, but that would also be slower. Not slower solution would be new model format[that topic has been around for a while too][also requires a solution for backward compatibility]. I also realised that since the major bottleneck is collision code, then probably it is also caused by memory reads rather than actual math[because traversing trees], but any gain is welcome. I didn't get anywhere, because I didn't feel it would be worth it.

EDIT: OR I am totally wrong on everything. That's an option too.
Also, I'm surprised you got no errors for referencing random memory after the buffers.
EDIT2: Also keep in mind, in order to actually utilize SSE properly, vectors need to be memory aligned to 16 bytes.
« Last Edit: August 14, 2016, 03:56:41 pm by Kobrar44 »
Oh guys, use that [ url ][ img ][ /img ][ /url ] :/

 
Re: Using SSE based data structures
Incidentally I found this article by some German who seems to know what he's talking about giving reasons that exactly this sort of 'optimisation' doesn't work. Basically it comes down to the boring truth that modern CPUs can already do floating-point adds so quickly that they're negligible next to memory access.
« Last Edit: August 14, 2016, 04:40:27 pm by Phantom Hoover »
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: Using SSE based data structures
Perhaps the most important point is that making all vectors take up 25% more memory means fewer of them can fit into the cache at one time; I ran into a similar problem when I experimented with replacing all maps in the source with dense_hash_maps. My framerate stayed pretty much the same, probably because the increased memory usage wound up cancelling out the improved performance because more cache misses were being generated.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 
Re: Using SSE based data structures
I kind of want to grep the FSO assembly output now to see if any vectorised instructions are even being generated, or if the main benefit of SSE is just not having to use the godawfully slow x87 operations.
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 
Re: Using SSE based data structures
Thanks for all the feedback thus far.

For those who were wondering what code I had tried thus far ... https://github.com/PeterMitsis/fs2open.github.com/tree/x86-sse-experiment 

I've tried attaching two screen showing the misshapen ships (from the first training mission in FS2) to give a better idea of what it looks like. Hopefully they show up.

@Krobar ... I had a feeling that (at least one of) the remaining problems may be due to a packed buffer structure similar to that which you described. Any pointers on which modules/routines/structures I should start reading up on to get a better idea of how this is done in the SCP?

(I don't know if anything will come of this experimenting.  At the moment, it is just a little bit of dabbling here and there for proof-of-concept.)

[attachment deleted by admin]

 
Re: Using SSE based data structures
@Phantom Hoover - Regarding the generated assembly output, I can only speak for what I am seeing on my machine (Ubuntu 16.04, g++ 5.4) and for that, the  "godawfully slow x87 operations" as you put it are in the current codebase merely replaced with equivalent SSE scalar instructions (such as ADDSS, MULSS, SUBSS, ...).  I suspect the case is the same for the other compilers, but I have not verified that.

 

Offline Echelon9

  • 210
Re: Using SSE based data structures
Seconding the suggestion to use a third party math library, if we change ours, rather than hand rolling.

That said, perhaps you could write unit tests using our brand new unit test harness for the existing 3D math and then validate your refactor for the SSE instructions. This should assist in proving correctness of that component, if the test coverage is sufficient

 
Re: Using SSE based data structures
I've just checked, by the way, and the only places where SSE vector instructions ('addps' and friends) are emitted are in libcode.a and nanovg.
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline z64555

  • 210
  • Self-proclaimed controls expert
    • Minecraft
    • Steam
Re: Using SSE based data structures
The models are loaded into a buffer. Straightup. Byte after byte, pof model files are copied over. This is really bad actually. Why?

Well, to have models we need vertices and to have vertices we need vectors. in POF the vectors are I believe setup like this: xyzxyzxyzxyz etc. FSO knows where in the file is the first vector because it knows the file structure, and knows how many vectors there are, so it just saves all the pointers to the vectors from the buffer in an array. So this would be the first place where things **** up.

So you'll need to create a buffer with the correct sized vectors and then copy/move the smaller vectors into them. Not difficult at all, but it will take a bit of processor time to form the buffer/cache.


Secure the Source, Contain the Code, Protect the Project
chief1983

------------
funtapaz: Hunchon University biologists prove mankind is evolving to new, higher form of life, known as Homopithecus Juche.
z64555: s/J/Do
BotenAlfred: <funtapaz> Hunchon University biologists prove mankind is evolving to new, higher form of life, known as Homopithecus Douche.