Author Topic: Antipodes 6 (r6533) (Read 13676 times)

Zacam · **on:** September 28, 2010, 12:45:44 am

Should be functionally identical to Trunk/Nightly Builds r6532, only faster, cleaner and better.

This is thanks to taylors contributions for engine enhancements. The full list of enhancements is directly quoted below the builds.

Windows Builds
FSO_Win-Antipodes_6_SSE-r6533.7z
MD5: EEFEA55D7AEECA0E6D25E7CD6E928C96

FSO_Win-Antipodes_6_SSE2-r6533.7z
MD5: F6D7377D6922852070834E54F96568A2

Linux Builds
(awaiting linkage for 32 bit)

64 Bit:
fso-LINUX-Antipodes6-Inferno-x64.tar.bz2
MD5: dc6e9cb02b5d90b5a31ca6f670b239f6

Mac OS-X Builds
FS2_Open-Inferno_Ant6_r6533
MD5: f79c3c3ebb8bbe097c9d7a6ddebe9998

In relation to the shaders, there are replacement files for Blur and PostProcessing:
(to be place in data\effects)
Ant_6-Shaders.7z
MD5: 3A2D4C6D77E941FA7E7F57D5CEBA4850

Because there are changes to the handling of POF data and cache generation, the following VP (in 7z format) is being provided with the new cache files for the models released in the 3.6.12 MediaVPs.
3612_Ant-Cache.7z
MD5: C5E8CDBE1A49354DFF160F9467129BB9

Quote from: taylor on September 22, 2010, 03:16:34 am

So, here is a basic rundown of the go_faster changes:

One vertex buffer per model.
The original task for go_faster. The current trunk code uses one VBO per submodel and keeps the index list local. That means that we have a lot of VBOs, many of which are very small, and we have to do a lot of switching between them. And buffer changes are one of the most costly operations in OpenGL. Put simply, the way that the code was originally written is quite possibly the least efficient method of doing it. So the goal was simple: have just one VBO per model, and use IBOs properly.

Even in the worst case, the performance gain is easily noticeable. But the performance gain increases for every submodel that a model has. A small fighter will see a nice performance boost from the simplified and optimized code, but a large ship with lots of turrets and other such submodels can get a tremendous performance improvement. And what's more, this also represents a change which allows OpenGL to better optimize memory usage.

But there remains room for future improvement as well. How VBOs are handled can still be optimized much further. A manager could be implemented which handles VBOs and tries to stuff as much info into as few of them as possible. VBOs work best when there are few of them and when are around a certain size. That is more of a long term goal however, as it would require a good bit of graphics work; not a rewrite, but more than a weekend or two to code (if done well). In the short term it is possible to make more use of the VBO that we have, moving more info into it. Things such as glow points and insignia and possibly even thruster glows can be added to the VBO. It should both make those things more efficient, optimize code and memory, and allow greater possibilities with those features because of shader use.

Updated IBX code.
Originally written to be just for dev use and only exist in the code base for a few months, all of these years later they are still around.

The new code, using the ".bx" extension, addresses every issue that I have had with the current IBX code. First off, it's just a bunch of int's now. The old code stored the actual list of vertex data, which was both a pain to save and read back as well as the fact that it introduced some machine specific errors into the equation. The new code takes advantage of one simple fact: all of that vertex info that was saved to the IBX file is always present. So the new code just stores the original position of the indexed vertexes and simply builds the new data by copying it from what already exists. No muss, no fuss.

Additionally the old code would store the index data as either short's or int's based on how many verts there were. The new code only stores as int's and then coverts to short's if it can. This way we still be the better optimized code for the graphics card, but have a much easier file format to deal with.

Strive to get rid of immediate mode as much as possible.
Immediate mode is slow. If used a lot it really hurts performance. Most of the performance gain from the addition of the HTL code is simply from the fact that it reduced the dependence on immediate mode for rendering. But the thing is, you don't require HTL to do that, and it doesn't require advanced hardware or new OpenGL versions either. We aren't even taking full advantage of OpenGL version 1.1 features.

So, things are largely in place now with the gr_render() function. This is a replacement for the existing/old gr_tmapper() function, only using arrays rather than immediate mode for rendering. The main difference in using it is that it takes an array of vertex structs rather than an array or pointers to vertex structs. I have already converted some parts of the code to use gr_render() instead of gr_tmapper(), but there are still plenty of areas that could be converted. And the nice thing is that is doesn't take any real graphics knowledge or anything to, any coder here should have the skills needed to make the changes.

The one real downside with gr_render() is that right now it can't handle things requiring TMAP_FLAG_CORRECT in order to render correctly. For the most part this is just models however, and so it would only have a problem in -nohtl mode. Getting it to work with shaders wouldn't be a difficult task, but having it work without shaders is another matter. Someone may come up with a brilliant idea to solve that little dilemma though.

And a implementation issue with opengl_render_internal() is that it doesn't really use the texture matrix stack all that smartly. We can only depend on it being 2 deep, so it's possible to go over than in some situations. Currently it should only be used with interface graphics, so the chances of it getting messed up are slim. Still, that is something that needs to be addressed.

Memory usage.
Overrides for global new/new[]/delete/delete[] for one thing. Whether or not it's the best thing to do aside, it's better than nothing, which is what we have now. Either way it addresses a problem in the code.

But the primary thing: reduced the size of the vertex struct. That one struct is made heavy use of throughout the code. The problem was that it had things in it which were not really used. Originally 80-bytes, it now sits at 42-bytes. So a reduction of nearly half. And if you wonder whether 38 bytes really matters, realize that every vert in a model has one of these. So you have a model with 30,000 verts, it's memory usage wen't from about 2.3 meg to about 1.2 meg. That greatly reduces the amount memory necessary to load and process a model.

Better optimize various states which don't change often.
There are numerous things in the OpenGL code which are not quite done as efficiently as they could be. Some of these are state which are set every time that a texture is made active, even though those state settings never change once the texture is created and don't need to be set again. Changing the code to handle that in a more intelligent manner is not a big thing overall, but it leads to both cleaner and easier to work with code as well as offering better performance.

Another big offender is the model render code. It often checks and sets things which never change once a model starts rendering. So instead of doing those checks for every submodel, or even worse every texture on every submodel, I just moved them to only be done once. Simple enough, but greatly improves code readability and makes things faster too.

Shaders & Hery's code.
I really don't have any polite things to say about Hery's code, so I'm not even going to bother trying. This code should simply have never been permitted into trunk. And that it represents a precursor to what was planned for a code rewrite just scares the hell out of me. This stuff is basically just an alien parasite that got latched onto the code tree for little more than slowing things down and making coders nauseous. I tried to work with it, but it just was not possible.

So, I ripped it out.

The first thing that I noticed when I replaced the shader code with the old code was that Hery's new code was about 20% slower than the code it replaced. That is a considerable performance boost for something that took all of 10 minutes or work. And the old code is far easier to read and understand. There are quite a number of things which could be done to the shader setup to squeak out a bit of extra performance, and better hardware compatibility, but the new code was just so difficult to work with that those things were impossible without a rewrite. The old code isn't great, I wrote it, I should know. But the old code was written with the intention of being replaced either in whole or in part by something more efficient. I made a few small changes to the code to both benefit their use as well as to improve performance a little bit beyond what it had originally. I'm also hoping that it will allow another coder to more easily implement some larger changes later on for both better performance and compatibility.

The changes are rather minor, but should be noted. First, all shaders will now have a SHADER_MODEL define available to them, which will identify to the shader whether the hardware is at a SM2.0 (#if SHADER_MODEL == 2), SM3.0 or SM4.0 level. What this means for shader developers is that they can write in some more advanced features and not have to break shaders for lower-end hardware. This will also allow for shaders to be used by SM2.0 hardware, if some sacrifices are willing to be made. Secondly, if shaders can't be used on the hardware for some reason, but GLSL is supported, then shader use can still be available for other things as opposed to simply being disabled and will instead just make models be rendered via the fixed-function pipeline.

And since people would surely complain about post-processing being gone with the code rip, I rewrote it as well. So now the post-processing code is cleaned up and written in the same basic way as the rest of the graphics code. It should be less buggy, less resource intensive, and just easier to figure out how the damn code works. I did cut a few corners, since I only worked on it this past weekend, but it should be pretty easy to follow I hope. The corner cutting was mainly to keep the code as self-contained as possible so that it could be edited on later without really messing up other parts of the graphics code. This means that there are a few magic numbers in there, but I tried to comment everything so that it makes sense. The new code should be a functional equivalent of the original code, so how it worked before should be how it works now (aside from the crashing and all ). Graphically it should produce the exact same results in other words. What was not implemented was the DoF code, since it was only an example in the old code and disabled. I also added the ability to see post-processing effects in the ship lab. This should make it a little quicker and easier to see what effects will look like without having to load up a mission. As an additional bonus, the new code is faster too. Plus, users can get post-processing without bloom by using "-bloom_intensity 0" (should that want/need such a thing for some reason).

Also, in the go_faster archive, there is a modified blur shader (post-v and blur-f). This was modified for a couple of reasons, the first being compatibility. The new version of the shader is about +10% FPS faster, has no discernible difference in image quality, works with SM2.0 hardware, and remains compatible with the older code/builds. Just drop these files in place and get around a 10% FPS boost with post-processing, even with existing trunk builds. And with the GLSL changes I mentioned earlier, this also means that SM2.0 hardware people could take advantage of post-processing (for the most part), even if they aren't able to use shaders for model rendering.

FUBAR-BDHR · **Reply #1 on:** September 28, 2010, 12:53:43 am

One thing when using this is make sure if you are using shaders that you card/drivers support at least shader version 3.0. If not update or enable no glsl as the code will no longer disable it automatically resulting in slower performance instead of faster.

The E · **Reply #2 on:** September 28, 2010, 01:12:29 am

However, if your card supports at least Shader Model 2, you can use these shaders: http://blueplanet.fsmods.net/E/effects.7z

Unpack these to mediavps_3612/data/effects

Swifty · **Reply #3 on:** September 28, 2010, 01:47:28 am

FYI, the new HUD framework isn't in Antipodes yet. It will be as soon as I crank out a patch against Antipodes that I know for sure will work properly with the graphics tweaks. Likely in a couple days.

Fury · **Reply #4 on:** September 28, 2010, 02:08:50 am

So I wonder how much of a performance improvement there is in something like WiH? Some of those optimizations sounds like they'd help a lot in Steve-O's ships.

The E · **Reply #5 on:** September 28, 2010, 02:13:31 am

There is some performance boost. Optimizations are still necessary, though.

Darius · **Reply #6 on:** September 28, 2010, 02:18:47 am

Scenes with Steve-O's ships (and the Solaris especially) have definitely got a framerate improvement.

General Battuta · **Reply #7 on:** September 28, 2010, 07:47:08 am

This is amazing. The VBO change alone had me drooling.

Hades · **Reply #8 on:** September 28, 2010, 10:19:41 am

This build is awesome, I get doubled FPS with it.

Making a note here: HUGE SUCCESS!!

Shivan Hunter · **Reply #9 on:** September 28, 2010, 10:33:54 am

This is most definitely a win.

80FPS in the beginning of Delenda Est, where I was getting 30-40 with those frakking Karunas on screen.

bp-massivebattle is even increased by a FPS or two... of course, using that thing as a benchmark is not even remotely fair.

General Battuta · **Reply #10 on:** September 28, 2010, 10:49:40 am

Hero_Swe reports a jump of 60 FPS, to 120, with 2 Karunas onscreen.

Topgun · **Reply #11 on:** September 28, 2010, 11:01:08 am

Taylor is my hero.

Sushi · **Reply #12 on:** September 28, 2010, 11:09:22 am

Just curious, what kind of drops do you get once the action starts?

I find that once the models get big and complex enough, the bottleneck isn't the GPU rendering them... it's the CPU trying to process collision detection on them.

Satellight · **Reply #13 on:** September 28, 2010, 11:12:17 am

In Delenda Est, I never felt difficulty to play (with Antipode) due to this ****ing "technical plague". I use the V-sync so I don't know how far I can go

but as I saw never under 35-40 FPS even in the middle of the battle (PEW PEW PEW

)
Spec : i7760, 4Gb Ram, HD 5850 XFX BE

EDIT : @Sushi : my drop to 35-40 FPS only happen at this very beginning, when the Karuna appears, and only for 1/2 second or less.

chief1983 · **Reply #14 on:** September 28, 2010, 02:14:19 pm

~~Rufus~~ Taylor, he's the man.

MatthTheGeek · **Reply #15 on:** September 28, 2010, 02:29:56 pm

Awesome improvement. DE actually got playable for me. Reached 30 fps at some points, although I still stayed much of the time between 8 and 10, full screen with all settings down.

chief1983 · **Reply #16 on:** September 28, 2010, 03:00:44 pm

I uploaded some of the mediashare files here if anyone wants a mirror.

CKid · **Reply #17 on:** September 28, 2010, 08:53:12 pm

When I try to select the build from the launcher I get a error and the launcher crashes on me. The only way that I can run the build is to actually open the freespace folder and click on the .exe itself.

General Battuta · **Reply #18 on:** September 28, 2010, 08:55:52 pm

Quote from: CKid on September 28, 2010, 08:53:12 pm

When I try to select the build from the launcher I get a error and the launcher crashes on me. The only way that I can run the build is to actually open the freespace folder and click on the .exe itself.

Use 5.5g or WXLauncher, not 5.5f.

It's 5, not 3 --The E

CKid · **Reply #19 on:** September 28, 2010, 09:14:49 pm

5.5g did the same thing but the WXLaunher worked out fine once I figured out how it works. Thank you for the quick response.

News:

Author Topic: Antipodes 6 (r6533) (Read 13676 times)

Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)

Re: Antipodes 6 (r6533)