Author Topic: Antipodes 6 (r6533)  (Read 13676 times)

0 Members and 1 Guest are viewing this topic.

Offline Zacam

  • Magnificent Bastard
  • Administrator
  • 211
  • I go Sledge-O-Matic on Spammers
    • Minecraft
    • Steam
    • Twitter
    • ModDB Feature
Should be functionally identical to Trunk/Nightly Builds r6532, only faster, cleaner and better.

This is thanks to taylors contributions for engine enhancements. The full list of enhancements is directly quoted below the builds.

Windows Builds
FSO_Win-Antipodes_6_SSE-r6533.7z
MD5: EEFEA55D7AEECA0E6D25E7CD6E928C96

FSO_Win-Antipodes_6_SSE2-r6533.7z
MD5: F6D7377D6922852070834E54F96568A2

Linux Builds
(awaiting linkage for 32 bit)

64 Bit:
fso-LINUX-Antipodes6-Inferno-x64.tar.bz2
MD5: dc6e9cb02b5d90b5a31ca6f670b239f6

Mac OS-X Builds
FS2_Open-Inferno_Ant6_r6533
MD5: f79c3c3ebb8bbe097c9d7a6ddebe9998

In relation to the shaders, there are replacement files for Blur and PostProcessing:
(to be place in data\effects)
Ant_6-Shaders.7z
MD5: 3A2D4C6D77E941FA7E7F57D5CEBA4850

Because there are changes to the handling of POF data and cache generation, the following VP (in 7z format) is being provided with the new cache files for the models released in the 3.6.12 MediaVPs.
3612_Ant-Cache.7z
MD5: C5E8CDBE1A49354DFF160F9467129BB9

So, here is a basic rundown of the go_faster changes:

One vertex buffer per model.
The original task for go_faster.  The current trunk code uses one VBO per submodel and keeps the index list local.  That means that we have a lot of VBOs, many of which are very small, and we have to do a lot of switching between them.  And buffer changes are one of the most costly operations in OpenGL.  Put simply, the way that the code was originally written is quite possibly the least efficient method of doing it.  So the goal was simple: have just one VBO per model, and use IBOs properly.

Even in the worst case, the performance gain is easily noticeable.  But the performance gain increases for every submodel that a model has.  A small fighter will see a nice performance boost from the simplified and optimized code, but a large ship with lots of turrets and other such submodels can get a tremendous performance improvement.  And what's more, this also represents a change which allows OpenGL to better optimize memory usage.

But there remains room for future improvement as well.  How VBOs are handled can still be optimized much further.  A manager could be implemented which handles VBOs and tries to stuff as much info into as few of them as possible.  VBOs work best when there are few of them and when are around a certain size.  That is more of a long term goal however, as it would require a good bit of graphics work; not a rewrite, but more than a weekend or two to code (if done well).  In the short term it is possible to make more use of the VBO that we have, moving more info into it.  Things such as glow points and insignia and possibly even thruster glows can be added to the VBO.  It should both make those things more efficient, optimize code and memory, and allow greater possibilities with those features because of shader use.

Updated IBX code.
Originally written to be just for dev use and only exist in the code base for a few months, all of these years later they are still around. :(

The new code, using the ".bx" extension, addresses every issue that I have had with the current IBX code.  First off, it's just a bunch of int's now.  The old code stored the actual list of vertex data, which was both a pain to save and read back as well as the fact that it introduced some machine specific errors into the equation.  The new code takes advantage of one simple fact: all of that vertex info that was saved to the IBX file is always present.  So the new code just stores the original position of the indexed vertexes and simply builds the new data by copying it from what already exists.  No muss, no fuss.

Additionally the old code would store the index data as either short's or int's based on how many verts there were.  The new code only stores as int's and then coverts to short's if it can.  This way we still be the better optimized code for the graphics card, but have a much easier file format to deal with.

Strive to get rid of immediate mode as much as possible.
Immediate mode is slow.  If used a lot it really hurts performance.  Most of the performance gain from the addition of the HTL code is simply from the fact that it reduced the dependence on immediate mode for rendering.  But the thing is, you don't require HTL to do that, and it doesn't require advanced hardware or new OpenGL versions either.  We aren't even taking full advantage of OpenGL version 1.1 features.

So, things are largely in place now with the gr_render() function.  This is a replacement for the existing/old gr_tmapper() function, only using arrays rather than immediate mode for rendering.  The main difference in using it is that it takes an array of vertex structs rather than an array or pointers to vertex structs.  I have already converted some parts of the code to use gr_render() instead of gr_tmapper(), but there are still plenty of areas that could be converted.  And the nice thing is that is doesn't take any real graphics knowledge or anything to, any coder here should have the skills needed to make the changes.

The one real downside with gr_render() is that right now it can't handle things requiring TMAP_FLAG_CORRECT in order to render correctly.  For the most part this is just models however, and so it would only have a problem in -nohtl mode.  Getting it to work with shaders wouldn't be a difficult task, but having it work without shaders is another matter.  Someone may come up with a brilliant idea to solve that little dilemma though.

And a implementation issue with opengl_render_internal() is that it doesn't really use the texture matrix stack all that smartly.  We can only depend on it being 2 deep, so it's possible to go over than in some situations.  Currently it should only be used with interface graphics, so the chances of it getting messed up are slim.  Still, that is something that needs to be addressed.

Memory usage.
Overrides for global new/new[]/delete/delete[] for one thing.  Whether or not it's the best thing to do aside, it's better than nothing, which is what we have now.  Either way it addresses a problem in the code.

But the primary thing: reduced the size of the vertex struct.  That one struct is made heavy use of throughout the code.  The problem was that it had things in it which were not really used.  Originally 80-bytes, it now sits at 42-bytes.  So a reduction of nearly half.  And if you wonder whether 38 bytes really matters, realize that every vert in a model has one of these.  So you have a model with 30,000 verts, it's memory usage wen't from about 2.3 meg to about 1.2 meg.  That greatly reduces the amount memory necessary to load and process a model.

Better optimize various states which don't change often.
There are numerous things in the OpenGL code which are not quite done as efficiently as they could be.  Some of these are state which are set every time that a texture is made active, even though those state settings never change once the texture is created and don't need to be set again.  Changing the code to handle that in a more intelligent manner is not a big thing overall, but it leads to both cleaner and easier to work with code as well as offering better performance.

Another big offender is the model render code.  It often checks and sets things which never change once a model starts rendering.  So instead of doing those checks for every submodel, or even worse every texture on every submodel, I just moved them to only be done once.  Simple enough, but greatly improves code readability and makes things faster too.

Shaders & Hery's code.
I really don't have any polite things to say about Hery's code, so I'm not even going to bother trying.  This code should simply have never been permitted into trunk.  And that it represents a precursor to what was planned for a code rewrite just scares the hell out of me.  This stuff is basically just an alien parasite that got latched onto the code tree for little more than slowing things down and making coders nauseous.  I tried to work with it, but it just was not possible.

So, I ripped it out.

The first thing that I noticed when I replaced the shader code with the old code was that Hery's new code was about 20% slower than the code it replaced.  That is a considerable performance boost for something that took all of 10 minutes or work.  And the old code is far easier to read and understand.  There are quite a number of things which could be done to the shader setup to squeak out a bit of extra performance, and better hardware compatibility, but the new code was just so difficult to work with that those things were impossible without a rewrite.  The old code isn't great, I wrote it, I should know.  But the old code was written with the intention of being replaced either in whole or in part by something more efficient.  I made a few small changes to the code to both benefit their use as well as to improve performance a little bit beyond what it had originally.  I'm also hoping that it will allow another coder to more easily implement some larger changes later on for both better performance and compatibility.

The changes are rather minor, but should be noted.  First, all shaders will now have a SHADER_MODEL define available to them, which will identify to the shader whether the hardware is at a SM2.0 (#if SHADER_MODEL == 2), SM3.0 or SM4.0 level.  What this means for shader developers is that they can write in some more advanced features and not have to break shaders for lower-end hardware.  This will also allow for shaders to be used by SM2.0 hardware, if some sacrifices are willing to be made.  Secondly, if shaders can't be used on the hardware for some reason, but GLSL is supported, then shader use can still be available for other things as opposed to simply being disabled and will instead just make models be rendered via the fixed-function pipeline.

And since people would surely complain about post-processing being gone with the code rip, I rewrote it as well.  So now the post-processing code is cleaned up and written in the same basic way as the rest of the graphics code.  It should be less buggy, less resource intensive, and just easier to figure out how the damn code works.  I did cut a few corners, since I only worked on it this past weekend, but it should be pretty easy to follow I hope.  The corner cutting was mainly to keep the code as self-contained as possible so that it could be edited on later without really messing up other parts of the graphics code.  This means that there are a few magic numbers in there, but I tried to comment everything so that it makes sense.  The new code should be a functional equivalent of the original code, so how it worked before should be how it works now (aside from the crashing and all ;)).  Graphically it should produce the exact same results in other words.  What was not implemented was the DoF code, since it was only an example in the old code and disabled.  I also added the ability to see post-processing effects in the ship lab.  This should make it a little quicker and easier to see what effects will look like without having to load up a mission.  As an additional bonus, the new code is faster too.  Plus, users can get post-processing without bloom by using "-bloom_intensity 0" (should that want/need such a thing for some reason).

Also, in the go_faster archive, there is a modified blur shader (post-v and blur-f).  This was modified for a couple of reasons, the first being compatibility.  The new version of the shader is about +10% FPS faster, has no discernible difference in image quality, works with SM2.0 hardware, and remains compatible with the older code/builds.  Just drop these files in place and get around a 10% FPS boost with post-processing, even with existing trunk builds.  And with the GLSL changes I mentioned earlier, this also means that SM2.0 hardware people could take advantage of post-processing (for the most part), even if they aren't able to use shaders for model rendering.
« Last Edit: September 28, 2010, 01:36:23 am by Zacam »
Report MediaVP issues, now on the MediaVP Mantis! Read all about it Here!
Talk with the community on Discord
"If you can keep a level head in all this confusion, you just don't understand the situation"

¤[D+¬>

[08/01 16:53:11] <sigtau> EveningTea: I have decided that I am a 32-bit registerkin.  Pronouns are eax, ebx, ecx, edx.
[08/01 16:53:31] <EveningTea> dhauidahh
[08/01 16:53:32] <EveningTea> sak
[08/01 16:53:40] * EveningTea froths at the mouth
[08/01 16:53:40] <sigtau> i broke him, boys

 

Offline FUBAR-BDHR

  • Self-Propelled Trouble Magnet
  • 212
  • Master Drunk
    • 165th Beer Drinking Hell Raisers
One thing when using this is make sure if you are using shaders that you card/drivers support at least shader version 3.0.  If not update or enable no glsl as the code will no longer disable it automatically resulting in slower performance instead of faster. 
No-one ever listens to Zathras. Quite mad, they say. It is good that Zathras does not mind. He's even grown to like it. Oh yes. -Zathras

 

Offline The E

  • He's Ebeneezer Goode
  • Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
However, if your card supports at least Shader Model 2, you can use these shaders: http://blueplanet.fsmods.net/E/effects.7z

Unpack these to mediavps_3612/data/effects
If I'm just aching this can't go on
I came from chasing dreams to feel alone
There must be changes, miss to feel strong
I really need lifе to touch me
--Evergrey, Where August Mourns

 

Offline Swifty

  • 210
  • I reject your fantasy & substitute my own
FYI, the new HUD framework isn't in Antipodes yet. It will be as soon as I crank out a patch against Antipodes that I know for sure will work properly with the graphics tweaks. Likely in a couple days.

 

Offline Fury

  • The Curmudgeon
  • 213
So I wonder how much of a performance improvement there is in something like WiH? Some of those optimizations sounds like they'd help a lot in Steve-O's ships.

 

Offline The E

  • He's Ebeneezer Goode
  • Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
There is some performance boost. Optimizations are still necessary, though.
If I'm just aching this can't go on
I came from chasing dreams to feel alone
There must be changes, miss to feel strong
I really need lifе to touch me
--Evergrey, Where August Mourns

 

Offline Darius

  • 211
Scenes with Steve-O's ships (and the Solaris especially) have definitely got a framerate improvement.

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
This is amazing. The VBO change alone had me drooling.

 

Offline Hades

  • FINISHING MODELS IS OVERRATED
  • 212
  • i wonder when my polycounts will exceed my iq
    • Skype
    • Steam
This build is awesome, I get doubled FPS with it.

Making a note here: HUGE SUCCESS!!
« Last Edit: September 28, 2010, 10:27:21 am by Hades »
[22:29] <sigtau> Hello, #hard-light?  I'm trying to tell a girl she looks really good for someone who doesn't exercise.  How do I word that non-offensively?
[22:29] <RangerKarl|AtWork> "you look like a big tasty muffin"
----
<batwota> wouldn’t that mean that it’s prepared to kiss your ass if you flank it :p
<batwota> wow
<batwota> KILL

 

Offline Shivan Hunter

  • 210
  • FRED needs lambdas!
This is most definitely a win.

80FPS in the beginning of Delenda Est, where I was getting 30-40 with those frakking Karunas on screen.

bp-massivebattle is even increased by a FPS or two... of course, using that thing as a benchmark is not even remotely fair. :P

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
Hero_Swe reports a jump of 60 FPS, to 120, with 2 Karunas onscreen.

 

Offline Topgun

  • 210
Taylor is my hero.

 

Offline Sushi

  • Art Critic
  • 211
Just curious, what kind of drops do you get once the action starts?

I find that once the models get big and complex enough, the bottleneck isn't the GPU rendering them... it's the CPU trying to process collision detection on them.

 

Offline Satellight

  • 27
  • Star Dreamer
In Delenda Est, I never felt difficulty to play (with Antipode) due to this ****ing "technical plague". I use the V-sync so I don't know how far I can go  ;7 but as I saw never under 35-40 FPS even in the middle of the battle (PEW PEW PEW  :D)
Spec : i7760, 4Gb Ram, HD 5850 XFX BE

EDIT : @Sushi : my drop to 35-40 FPS only happen at this very beginning, when the Karuna appears, and only for 1/2 second or less.
« Last Edit: September 28, 2010, 12:09:13 pm by Satellight »
Never far away from HLP and from a computer with an installed FreeSpace.

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Rufus Taylor, he's the man.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline MatthTheGeek

  • Captain Obvious
  • 212
  • Frenchie McFrenchface
Awesome improvement. DE actually got playable for me. Reached 30 fps at some points, although I still stayed much of the time between 8 and 10, full screen with all settings down.
People are stupid, therefore anything popular is at best suspicious.

Mod management tools     -     Wiki stuff!     -     Help us help you

666maslo666: Releasing a finished product is not a good thing! It is a modern fad.

SpardaSon21: it seems like you exist in a permanent state of half-joking misanthropy

Axem: when you put it like that, i sound like an insane person

bigchunk1: it's not retarded it's american!
bigchunk1: ...

batwota: steele's maneuvering for the coup de gras
MatthTheGeek: you mispelled grâce
Awaesaar: grace
batwota: oh right :P
Darius: ah!
Darius: yes, i like that
MatthTheGeek: the way you just spelled it it means fat
Awaesaar: +accent I forgot how to keyboard
MatthTheGeek: or grease
Darius: the killing fat!
Axem: jabba does the coup de gras
MatthTheGeek: XD
Axem: bring me solo and a cookie

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
I uploaded some of the mediashare files here if anyone wants a mirror.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline CKid

When I try to select the build from the launcher I get a error and the launcher crashes on me. The only way that I can run the build is to actually open the freespace folder and click on the .exe itself.
If I agreed with you, we would both be wrong

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
When I try to select the build from the launcher I get a error and the launcher crashes on me. The only way that I can run the build is to actually open the freespace folder and click on the .exe itself.

Use 5.5g or WXLauncher, not 5.5f.

It's 5, not 3 --The E
« Last Edit: September 28, 2010, 09:06:03 pm by The E »

 

Offline CKid

5.5g did the same thing but the WXLaunher worked out fine once I figured out how it works. Thank you for the quick response.
If I agreed with you, we would both be wrong