Hard Light Productions Forums

Modding, Mission Design, and Coding => FS2 Open Coding - The Source Code Project (SCP) => Test Builds => Topic started by: Flaming_Sword on February 16, 2013, 01:09:42 am

Title: Performance Improvements (Updated 22/06/14)
Post by: Flaming_Sword on February 16, 2013, 01:09:42 am
I was laying the groundwork for that EFF container stuff when I got a bit sidetracked. I think I ended up with a higher framerate instead... :D

Compiled with MSVC2008 Express. Based on Antipodes Revision 9533. Please compare with other builds, particularly clean Antipodes and clean Trunk.

Executable (Trunk Release Inferno SSE2):
http://www.mediafire.com/?xa93tn9ap7v7vws

Executable (Antipodes Release SSE2):
http://www.mediafire.com/?dw5jresv3sr5154

Executable (Antipodes Release SSE2 + BP Compatibility build):
http://www.mediafire.com/?llrtdqz1iy7gg7t

Patch:
http://www.mediafire.com/?je4uay9mifujo8j


The patch works on both Antipodes and Trunk since the files affected are identical between the two.

Testing was done with the massive battle mission provided with Blue Planet 2 (which explains the build with the BP Compatibility patch). That one happens to have lots of beams and explosions that kill my framerate, so improvements tend to be noticeable.

Feel free to review the code in the patch - I suspect the same sorts of changes could be made to the code that loads textures in the first place for a further performance boost.

UPDATE:

Built it in linux on underpowered machine, with BP compatibility and without my changes, got some profiling info while running artemis at 4fps. I think others might find this information useful.

http://www.mediafire.com/?1k1cw1auirfharr

UPDATE 2:

Renamed thread.

More improvements with the aid of the profiler data. Messed with bm_load() so missions should load a bit faster (needs quantifying). Seems to have improved framerate too.

Hit the following dead ends (no improvement or made things worse):

fvi_ray_boundingbox() - other supposedly efficient algorithm tried involved many more floating point operations and turned the game into a slideshow (trying to run ray tracing on CPU, go figure - may consider offloading to GPU somehow)
see http://tavianator.com/2011/05/fast-branchless-raybounding-box-intersections/
I was able to halve my framerate by adding extra useless lines of code, so this might be one of the bigger CPU bottlenecks

model_octant_find_faces_sub() - attempts to do changes similar to bm_lock() improvements resulted in performance decrease

Executable (Antipodes Release SSE2 + BP Compatibility build):
http://www.mediafire.com/?1n1056ab5z614mi

Patch:
http://www.mediafire.com/?8kq27ni3kli5u7n


UPDATE 3:

2nd patch degrades performance and should be ignored.


Update 4:

Reversed TGA specific changes, TGA no longer looks like crap. Patch attached.


Update 5:

New patch - previous patches should be disregarded. Needs testing.

http://www.mediafire.com/download/omd55wlt6o9jh0j/inline_trunk_9771.patch

Linux build:

http://www.mediafire.com/download/wp23mjm3vp5sa7e/fs2_open_3.7.1_inline.tar.gz

Compiled on LMDE x64

Windows builds: http://blueplanet.fsmods.net/E/stuff/Flaming_Sword_Speedup.7z


Update 6:

Added tweak to shader string comparison code - new patch attached:

Linux Build: fs2_open_3.7.1_inline_string_compare_trunk_9802.tar.gz (http://www.mediafire.com/download/3kbgy4unm6j4bz4/fs2_open_3.7.1_inline_string_compare_trunk_9802.tar.gz)

Mac Build: FS2_Open_FS_ISC_r9802.zip (http://swc.fs2downloads.com/builds/OSX/FS2_Open_FS_ISC_r9802.zip)

[attachment deleted by ninja]

Update 7:

Updated patch for trunk 10338, posted below and attached.

Update 8:

Updated for trunk 10831 - Now with windows executables (unix goodies in separate patch)

http://www.mediafire.com/download/9u05k7edc8040y5/inline_string_compare_trunk_10831.patch
http://www.mediafire.com/download/zc2ytfo3z2c9rgb/inline_string_compare_trunk_10831.7z
http://www.mediafire.com/download/b9wp9uzc2wt49aw/unix_goodies_trunk_10831.patch - stacktrace printing and symbols for gdb
Title: Re: bm_lock() Performance Improvements
Post by: Kobrar44 on February 16, 2013, 08:47:04 am
tests on both artemis and massive battle, massive battle is very random so I considered Artemis more reliable.
(http://oi48.tinypic.com/2ij677k.jpg)
(http://oi45.tinypic.com/2b4go1.jpg)

I think I helped Artemis on bp comp build to slow down a tiny bit by taking few screenshots, shouldn't make huge difference.

No problems encountered on test build.
Title: Re: bm_lock() Performance Improvements
Post by: Echelon9 on February 16, 2013, 05:35:43 pm
Perhaps it's the way I'm reading your charts, but it looks like this patch slowed down FPS for you but less % of GPU was used?
Title: Re: bm_lock() Performance Improvements
Post by: Kobrar44 on February 16, 2013, 08:02:35 pm
Massive Battle is CPU-intensive it appears. GPU didn't have a chance to sweat a tiny bit, and every massive battle is different. Artemis had a performance gain though.
Title: Re: bm_lock() Performance Improvements
Post by: Echelon9 on February 16, 2013, 10:34:21 pm
Ah, so Artemis is the second chart?
Title: Re: Performance Improvements (Updated 25/02/13)
Post by: Flaming_Sword on February 25, 2013, 07:03:52 pm
Can you please rerun the tests on the new build and post the results?
Title: Re: Performance Improvements (Updated 25/02/13)
Post by: Kobrar44 on February 26, 2013, 04:16:37 am
first of all:
correct (http://www.youtube.com/watch?feature=player_detailpage&v=_140jxPitCk#t=199s)
performance tweaks (http://oi48.tinypic.com/2boenl.jpg)
secondly, charts
(http://oi45.tinypic.com/2lncltc.jpg)
(http://oi45.tinypic.com/outfeb.jpg)
Title: Re: Performance Improvements (Updated 25/02/13)
Post by: Luis Dias on February 26, 2013, 04:35:56 am
Do you have access to the raw data? You could place it in a spreadsheet and give a new chart that outputs "bp comp" - "perf-tweaks". You can also output the average of that amount and so on. Eyeballing it, it seems that the improvement in FPS is next to zero, if not negative.
Title: Re: Performance Improvements (Updated 25/02/13)
Post by: Flaming_Sword on February 26, 2013, 07:03:26 am
That's funny... I was testing by watching the FPS counter in the corner of the screen when the missions were running. How reliable/unreliable is that measure? How were you getting your data? Can I get a copy of the trailer mission for testing? :P
Title: Re: Performance Improvements (Updated 25/02/13)
Post by: Kobrar44 on February 26, 2013, 09:09:02 am
mission should be here: http://199.91.152.85/6i5utbbyya9g/pjvoudwd9as5o2o/FSO+Trailer+Mission.7z (http://199.91.152.85/6i5utbbyya9g/pjvoudwd9as5o2o/FSO+Trailer+Mission.7z) [by default it links to "mediavps" instead of "mediavps_3612"]
I use MSI Afterburner, but I am going to check out HWiNFO.
Title: Re: Performance Improvements (Updated 03/03/13)
Post by: Flaming_Sword on March 02, 2013, 07:34:41 pm
I can confirm the results - the 2nd patch should be ignored.
Title: Re: Performance Improvements (Updated 03/03/13)
Post by: Flaming_Sword on March 10, 2013, 12:37:18 am
Stupid question, but when you tested with the FSO trailer, did you have any rendering issues? Particularly with the text that shows at the beginning (tga textures).
Title: Re: Performance Improvements (Updated 03/03/13)
Post by: Kobrar44 on March 10, 2013, 03:23:37 pm
My post began with such issue.
first of all:
correct (http://www.youtube.com/watch?feature=player_detailpage&v=_140jxPitCk#t=199s)
performance tweaks (http://oi48.tinypic.com/2boenl.jpg)
Lack of black background on performance tweaks and **** colours on bitmaps, if its not too apparent. The font and bitmaps were fukced up too in the beginning.
Title: Re: Performance Improvements (Updated 03/03/13)
Post by: Flaming_Sword on March 10, 2013, 07:22:23 pm
Not just me then. Something in there is messing with rendering of TGA. PNG/DDS work fine in my testing, which is strange because I did the same changes to them that I did to TGA. Continuing to investigate.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on March 15, 2013, 09:30:42 pm
Removing the TGA specific changes seems to have fixed the issue.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Echelon9 on March 15, 2013, 10:07:23 pm
Is there any hard evidence that these changes are performance improvements?

To date, it seems users reporting nothing compelling beyond marginal improvement.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on March 16, 2013, 12:13:29 am
Is there any hard evidence that these changes are performance improvements?

To date, it seems users reporting nothing compelling beyond marginal improvement.

Uh, the charts? The patch itself that you can test?

I also assume that a marginal improvement is better than no improvement at all. Feel free to investigate further methods for improving performance. We can only benefit from such work.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: jg18 on March 16, 2013, 02:13:11 am
Is there any hard evidence that these changes are performance improvements?

To date, it seems users reporting nothing compelling beyond marginal improvement.

As in we should collect mean frame rates with and without the changes from a bunch of people on randomly assigned mission segments and run a paired t-test (http://en.wikipedia.org/wiki/Two-sample_t-test#Dependent_t-test_for_paired_samples)? It'd be doable.

/me had to pull out his college stats book for that.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on March 16, 2013, 06:12:46 am
We should probably collect framerate data for the whole run so we can do more analysis after the fact. Variance in framerate is also a consideration.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: jg18 on March 16, 2013, 08:13:12 pm
Over the whole run = over an entire mission? What sort of analysis? And a consideration how? Although admittedly we're approaching the limits of my stats knowledge...

The idea I was going with was hypothesis testing with the null hypothesis being "the difference in average frame rate comparing mission segments played with your changes to the same segments played without them is zero (or otherwise insignificant)" and then running a paired t-test to see whether it's possible to reject H0 with p < 0.05. I don't have my stats book nearby, so the wording may be a bit off.

EDIT: I figure that since E9 asked for hard evidence, this could qualify as that, if done right.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Kobrar44 on March 17, 2013, 07:46:06 am
When would these optimizations give the potential performance boost? I guess the best way is to fred a mission with a whole lotta of this specific stuff in a controlled way.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on March 17, 2013, 08:07:02 am
The biggest effect should be in the loading of textures, especially when rendering animations (loads *every* frame to be rendered).

I would go for something involving lots of ships jumping in and getting destroyed, as well as lots of beamfire. The sort of thing that tends to kill framerate anyway.

Such as massive battle...
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Goober5000 on June 12, 2013, 01:24:34 am
:bump: for progress.  Also for news on the new EFF format. :nervous:
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: mjn.mixael on June 12, 2013, 08:27:35 am
As much as I want the new EFF format.. isn't 3.7 final far more important?
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Goober5000 on June 12, 2013, 04:32:07 pm
Yes indeed.  But Flaming_Sword is a bit of a unique case.  He hasn't been closely involved in SCP development for a few years now, but he is just about the only person who has made serious progress on the new EFF format (or any similar format).  So I asked him back just for that.  I'm concerned he may have dropped off the grid again. :(
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on August 15, 2013, 02:24:06 am
Sorry for the absence, between a blown up machine and work (and Guild Wars 2 :nervous:), I haven't been looking at FS2 for a while.

That said, the performance patch was attached in this thread and can be merged wherever whenever. EFF has that spec document lying around, not sure if there are still copies online somewhere. I recall finding an actual TLV library on sourceforge or something, but it didn't seem like it suited our needs here.

I believe any EFF code I've written so far is in the "WTF was I thinking when I wrote that?" stage and will likely get thrown out. Hopefully, it's all still sitting on a hard drive somewhere in the house...

If anything the first stage of EFF would be a separate generic library for reading/writing so it might be a fair while before any FS2 code rears its head.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Goober5000 on August 18, 2013, 06:25:48 pm
Hmm.  If you mean the patch in the first post, it got deleted in a recent attachment purge.  Could you re-upload it?
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on August 19, 2013, 10:57:12 pm
I'll dig around for it. I'm also pretty sure this APNG thing might work out better than what I had for EFF, relegating the ideas to simply a potential replacement for VPs.
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on August 21, 2013, 10:21:01 am
Pretty sure this one is the patch.

[attachment deleted by ninja]
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Flaming_Sword on September 07, 2013, 11:11:34 am
Now I feel stupid. The following patch is probably a much better way to do it. Please test.

Code: [Select]
Index: code/model/modelcollide.cpp
===================================================================
--- code/model/modelcollide.cpp (revision 9771)
+++ code/model/modelcollide.cpp (working copy)
@@ -76,7 +76,7 @@
 // Returns non-zero if vector from p0 to pdir
 // intersects the bounding box.
 // hitpos could be NULL, so don't fill it if it is.
-int mc_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpos )
+inline int mc_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpos )
 {
 
  vec3d tmp_hitpos;
@@ -481,7 +481,7 @@
 
 int model_collide_sub( void *model_ptr );
 
-void model_collide_sortnorm(ubyte * p)
+inline void model_collide_sortnorm(ubyte * p)
 {
  int frontlist = w(p+36);
  int backlist = w(p+40);
@@ -509,7 +509,7 @@
 
 //calls the object interpreter to render an object.  The object renderer
 //is really a seperate pipeline. returns true if drew
-int model_collide_sub(void *model_ptr )
+inline int model_collide_sub(void *model_ptr )
 {
  ubyte *p = (ubyte *)model_ptr;
  int chunk_type, chunk_size;
@@ -602,7 +602,7 @@
  }
 }
 
-void model_collide_bsp(bsp_collision_tree *tree, int node_index)
+inline void model_collide_bsp(bsp_collision_tree *tree, int node_index)
 {
  if ( tree->node_list == NULL || tree->n_verts <= 0) {
  return;
@@ -949,7 +949,7 @@
  return false;
 }
 
-bool mc_check_sldc(int offset)
+inline bool mc_check_sldc(int offset)
 {
  if (offset > Mc_pm->sldc_size-5) //no way is this big enough
  return false;
@@ -999,7 +999,7 @@
 }
 
 // checks a vector collision against a ships shield (if it has shield points defined).
-void mc_check_shield()
+inline void mc_check_shield()
 {
  int i;
 
@@ -1031,7 +1031,7 @@
 
 // This function recursively checks a submodel and its children
 // for a collision with a vector.
-void mc_check_subobj( int mn )
+inline void mc_check_subobj( int mn )
 {
  vec3d tempv;
  vec3d hitpt; // used in bounding box check
Index: code/math/fvi.cpp
===================================================================
--- code/math/fvi.cpp (revision 9771)
+++ code/math/fvi.cpp (working copy)
@@ -314,75 +314,75 @@
  return 0;
 }
 
-/**
- * Finds intersection of a ray and an axis-aligned bounding box
- *
- * Given a ray with origin at p0, and direction pdir, this function
- * returns non-zero if that ray intersects an axis-aligned bounding box
- * from min to max.   If there was an intersection, then hitpt will contain
- * the point where the ray begins inside the box.
- * Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
- */
-int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
-{
- bool inside = true;
- bool middle[3] = { true, true, true };
- int i;
- int which_plane;
- float maxt[3];
- float candidate_plane[3];
+///**
+// * Finds intersection of a ray and an axis-aligned bounding box
+// *
+// * Given a ray with origin at p0, and direction pdir, this function
+// * returns non-zero if that ray intersects an axis-aligned bounding box
+// * from min to max.   If there was an intersection, then hitpt will contain
+// * the point where the ray begins inside the box.
+// * Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
+// */
+//int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
+//{
+// bool inside = true;
+// bool middle[3] = { true, true, true };
+// int i;
+// int which_plane;
+// float maxt[3];
+// float candidate_plane[3];
+//
+// for (i = 0; i < 3; i++) {
+// if (p0->a1d[i] < min->a1d[i]) {
+// candidate_plane[i] = min->a1d[i];
+// middle[i] = false;
+// inside = false;
+// } else if (p0->a1d[i] > max->a1d[i]) {
+// candidate_plane[i] = max->a1d[i];
+// middle[i] = false;
+// inside = false;
+// }
+// }
+//
+// // ray origin inside bounding box
+// if ( inside ) {
+// *hitpt = *p0;
+// return 1;
+// }
+//
+// // calculate T distances to candidate plane
+// for (i = 0; i < 3; i++) {
+// if ( !middle[i] && (pdir->a1d[i] != 0.0f) )
+// maxt[i] = (candidate_plane[i] - p0->a1d[i]) / pdir->a1d[i];
+// else
+// maxt[i] = -1.0f;
+// }
+//
+// // Get largest of the maxt's for final choice of intersection
+// which_plane = 0;
+// for (i = 1; i < 3; i++) {
+// if (maxt[which_plane] < maxt[i])
+// which_plane = i;
+// }
+//
+// // check final candidate actually inside box
+// if (maxt[which_plane] < 0.0f)
+// return 0;
+//
+// for (i = 0; i < 3; i++) {
+// if (which_plane != i) {
+// hitpt->a1d[i] = p0->a1d[i] + maxt[which_plane] * pdir->a1d[i];
+//
+// if ( (hitpt->a1d[i] < min->a1d[i]) || (hitpt->a1d[i] > max->a1d[i]) )
+// return 0;
+// } else {
+// hitpt->a1d[i] = candidate_plane[i];
+// }
+// }
+//
+// return 1;
+//}
 
- for (i = 0; i < 3; i++) {
- if (p0->a1d[i] < min->a1d[i]) {
- candidate_plane[i] = min->a1d[i];
- middle[i] = false;
- inside = false;
- } else if (p0->a1d[i] > max->a1d[i]) {
- candidate_plane[i] = max->a1d[i];
- middle[i] = false;
- inside = false;
- }
- }
-
- // ray origin inside bounding box
- if ( inside ) {
- *hitpt = *p0;
- return 1;
- }
-
- // calculate T distances to candidate plane
- for (i = 0; i < 3; i++) {
- if ( !middle[i] && (pdir->a1d[i] != 0.0f) )
- maxt[i] = (candidate_plane[i] - p0->a1d[i]) / pdir->a1d[i];
- else
- maxt[i] = -1.0f;
- }
-
- // Get largest of the maxt's for final choice of intersection
- which_plane = 0;
- for (i = 1; i < 3; i++) {
- if (maxt[which_plane] < maxt[i])
- which_plane = i;
- }
-
- // check final candidate actually inside box
- if (maxt[which_plane] < 0.0f)
- return 0;
-
- for (i = 0; i < 3; i++) {
- if (which_plane != i) {
- hitpt->a1d[i] = p0->a1d[i] + maxt[which_plane] * pdir->a1d[i];
-
- if ( (hitpt->a1d[i] < min->a1d[i]) || (hitpt->a1d[i] > max->a1d[i]) )
- return 0;
- } else {
- hitpt->a1d[i] = candidate_plane[i];
- }
- }
-
- return 1;
-}
-
 /**
  * Given largest componant of normal, return i & j
  * If largest componant is negative, swap i & j
Index: code/math/fvi.h
===================================================================
--- code/math/fvi.h (revision 9771)
+++ code/math/fvi.h (working copy)
@@ -104,8 +104,77 @@
 // from min to max.   If there was an intersection, then hitpt will contain
 // the point where the ray begins inside the box.
 // Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
-int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt );
+//int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt );
 
+/**
+ * Finds intersection of a ray and an axis-aligned bounding box
+ *
+ * Given a ray with origin at p0, and direction pdir, this function
+ * returns non-zero if that ray intersects an axis-aligned bounding box
+ * from min to max.   If there was an intersection, then hitpt will contain
+ * the point where the ray begins inside the box.
+ * Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
+ */
+inline int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
+{
+  bool inside = true;
+  bool middle[3] = { true, true, true };
+  int i;
+  int which_plane;
+  float maxt[3];
+  float candidate_plane[3];
+
+  for (i = 0; i < 3; i++) {
+    if (p0->a1d[i] < min->a1d[i]) {
+      candidate_plane[i] = min->a1d[i];
+      middle[i] = false;
+      inside = false;
+    } else if (p0->a1d[i] > max->a1d[i]) {
+      candidate_plane[i] = max->a1d[i];
+      middle[i] = false;
+      inside = false;
+    }
+  }
+
+  // ray origin inside bounding box
+  if ( inside ) {
+    *hitpt = *p0;
+    return 1;
+  }
+
+  // calculate T distances to candidate plane
+  for (i = 0; i < 3; i++) {
+    if ( !middle[i] && (pdir->a1d[i] != 0.0f) )
+      maxt[i] = (candidate_plane[i] - p0->a1d[i]) / pdir->a1d[i];
+    else
+      maxt[i] = -1.0f;
+  }
+
+  // Get largest of the maxt's for final choice of intersection
+  which_plane = 0;
+  for (i = 1; i < 3; i++) {
+    if (maxt[which_plane] < maxt[i])
+      which_plane = i;
+  }
+
+  // check final candidate actually inside box
+  if (maxt[which_plane] < 0.0f)
+    return 0;
+
+  for (i = 0; i < 3; i++) {
+    if (which_plane != i) {
+      hitpt->a1d[i] = p0->a1d[i] + maxt[which_plane] * pdir->a1d[i];
+
+      if ( (hitpt->a1d[i] < min->a1d[i]) || (hitpt->a1d[i] > max->a1d[i]) )
+        return 0;
+    } else {
+      hitpt->a1d[i] = candidate_plane[i];
+    }
+  }
+
+  return 1;
+}
+
 // sphere polygon collision prototypes
 
 // Given a polygon vertex list and a moving sphere, find the first contact the sphere makes with the edge, if any
Title: Re: Performance Improvements (Updated 16/03/13)
Post by: Goober5000 on September 07, 2013, 05:40:53 pm
Maybe you could post a build with that patch applied? :nervous:
Title: Re: Performance Improvements (Updated 08/09/13)
Post by: Flaming_Sword on September 07, 2013, 11:11:59 pm
 :bump:

New patch and linux build posted. I don't have access to a Windows build environment at the moment.
Title: Re: Performance Improvements (Updated 08/09/13)
Post by: The E on September 08, 2013, 07:34:15 am
I've taken the liberty of uploading Windows builds myself and editing the post.
Title: Re: Performance Improvements (Updated 08/09/13)
Post by: chief1983 on September 11, 2013, 06:53:55 pm
I'll try to get a Mac one up if no one else does.
Title: Re: Performance Improvements (Updated 21/09/13)
Post by: Flaming_Sword on September 20, 2013, 11:38:12 am
 :bump:

New patch posted, with tweak for shaders. Also included, linux debug builds print stacktrace debug upon int3(); (run "./autogen.sh --enable-debug" again)

Hopefully, this plays nice with the builds posted here:

http://www.hard-light.net/forums/index.php?topic=85618.0
Title: Re: Performance Improvements (Updated 21/09/13)
Post by: chief1983 on September 20, 2013, 05:55:32 pm
Mac build link added to first post.
Title: Re: Performance Improvements (Updated 21/09/13)
Post by: niffiwan on September 20, 2013, 08:10:52 pm
Also included, linux debug builds print stacktrace debug upon int3(); (run "./autogen.sh --enable-debug" again)

That's very cool :)  Since we tend to be removing int3()'s from the code, could this be applied to Assert/Assertion as well?

And... :nervous: is it feasible to implement a "pause to attach debugger" style dialogue likes windows builds have? There's been a number of occasions when I really wished that I could do that (usually when trying to track down hard to reproduce bugs).
Title: Re: Performance Improvements (Updated 21/09/13)
Post by: Flaming_Sword on September 20, 2013, 08:50:21 pm
The code can simply be moved to where it's needed.

Also, I'm pretty sure you can use gdb since debug versions are compiled with -g.
Title: Re: Performance Improvements (Updated 25/01/14)
Post by: Flaming_Sword on January 24, 2014, 07:10:40 pm
:bump:

Updated patch for trunk 10338, posted below and attached.

Code: [Select]
Index: code/bmpman/bm_internal.h
===================================================================
--- code/bmpman/bm_internal.h (revision 10338)
+++ code/bmpman/bm_internal.h (working copy)
@@ -21,6 +21,12 @@
 #include "globalincs/pstypes.h"
 #include "bmpman/bmpman.h"
 
+#include "ddsutils/ddsutils.h"
+#include "pcxutils/pcxutils.h"
+#include "pngutils/pngutils.h"
+#include "jpgutils/jpgutils.h"
+#include "tgautils/tgautils.h"
+#include "palman/palman.h"
 
 // no-type ( used in: bm_bitmaps[i].type )
 #define BM_TYPE_NONE 0
@@ -114,18 +120,444 @@
 } bitmap_entry;
 
 extern bitmap_entry bm_bitmaps[MAX_BITMAPS];
+extern int Is_standalone;
+//
+//// image specific lock functions
+//void bm_lock_ani( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_dds( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_png( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_jpg( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_pcx( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_tga( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+//void bm_lock_user( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+#define EFF_FILENAME_CHECK { if ( be->type == BM_TYPE_EFF ) strncpy( filename, be->info.ani.eff.filename, MAX_FILENAME_LEN ); else strncpy( filename, be->filename, MAX_FILENAME_LEN ); }
 
+inline void bm_lock_pcx( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  ubyte *data;
+  int pcx_error;
+  char filename[MAX_FILENAME_LEN];
 
-// image specific lock functions
-void bm_lock_ani( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_dds( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_png( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_jpg( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_pcx( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_tga( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
-void bm_lock_user( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags );
+  // Unload any existing data
+  bm_free_data( bitmapnum, false );
 
+  be->mem_taken = (bmp->w * bmp->h * (bpp >> 3));
+  data = (ubyte *)bm_malloc(bitmapnum, be->mem_taken);
+  bmp->bpp = bpp;
+  bmp->data = (ptr_u)data;
+  bmp->palette = (bpp == 8) ? gr_palette : NULL;
+  memset( data, 0, be->mem_taken );
 
+  Assert( &be->bm == bmp );
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+
+  // some sanity checks on flags
+  Assert(!((flags & BMP_AABITMAP) && (flags & BMP_TEX_ANY)));           // no aabitmap textures
+
+  // make sure we are using the correct filename in the case of an EFF.
+  // this will populate filename[] whether it's EFF or not
+  EFF_FILENAME_CHECK;
+
+  pcx_error = pcx_read_bitmap( filename, data, NULL, (bpp >> 3), (flags & BMP_AABITMAP), 0, be->dir_type );
+
+  if ( pcx_error != PCX_ERROR_NONE ) {
+    mprintf(("Couldn't load PCX!!! (%s)\n", filename));
+    return;
+  }
+
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+
+  bmp->flags = 0;
+
+  bm_convert_format( bitmapnum, bmp, bpp, flags );
+}
+
+inline void bm_lock_ani( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  anim        *the_anim;
+  anim_instance *the_anim_instance;
+  bitmap      *bm;
+  ubyte       *frame_data;
+  int       size, i;
+  int       first_frame, nframes;
+
+  first_frame = be->info.ani.first_frame;
+  nframes = bm_bitmaps[first_frame].info.ani.num_frames;
+
+  if ( (the_anim = anim_load(bm_bitmaps[first_frame].filename, bm_bitmaps[first_frame].dir_type)) == NULL ) {
+    nprintf(("BMPMAN", "Error opening %s in bm_lock\n", be->filename));
+    return;
+  }
+
+  if ( (the_anim_instance = init_anim_instance(the_anim, bpp)) == NULL ) {
+    nprintf(("BMPMAN", "Error opening %s in bm_lock\n", be->filename));
+    anim_free(the_anim);
+    return;
+  }
+
+  int can_drop_frames = 0;
+
+  if ( the_anim->total_frames != bm_bitmaps[first_frame].info.ani.num_frames )  {
+    can_drop_frames = 1;
+  }
+
+  bm = &bm_bitmaps[first_frame].bm;
+  size = bm->w * bm->h * (bpp >> 3);
+  be->mem_taken = size;
+
+  Assert( size > 0 );
+
+  for ( i=0; i<nframes; i++ ) {
+    be = &bm_bitmaps[first_frame+i];
+    bm = &bm_bitmaps[first_frame+i].bm;
+
+    // Unload any existing data
+    bm_free_data( first_frame+i, false );
+
+    bm->flags = 0;
+
+    // briefing editor in Fred2 uses aabitmaps (ani's) - force to 8 bit
+    bm->bpp = Is_standalone ? (ubyte)8 : bpp;
+
+    bm->data = (ptr_u)bm_malloc(first_frame + i, size);
+
+    frame_data = anim_get_next_raw_buffer(the_anim_instance, 0 ,flags & BMP_AABITMAP ? 1 : 0, bm->bpp);
+
+    ubyte *dptr, *sptr;
+
+    sptr = frame_data;
+    dptr = (ubyte *)bm->data;
+
+    if ( (bm->w!=the_anim->width) || (bm->h!=the_anim->height) )  {
+      // Scale it down
+      // 8 bit
+      if(bpp == 8){
+        int w,h;
+        fix u, utmp, v, du, dv;
+
+        u = v = 0;
+
+        du = ( the_anim->width*F1_0 ) / bm->w;
+        dv = ( the_anim->height*F1_0 ) / bm->h;
+
+        for (h = 0; h < bm->h; h++) {
+          ubyte *drow = &dptr[bm->w * h];
+          ubyte *srow = &sptr[f2i(v)*the_anim->width];
+
+          utmp = u;
+
+          for (w = 0; w < bm->w; w++) {
+            *drow++ = srow[f2i(utmp)];
+            utmp += du;
+          }
+          v += dv;
+        }
+      }
+      // 16 bpp
+      else {
+        int w,h;
+        fix u, utmp, v, du, dv;
+
+        u = v = 0;
+
+        du = ( the_anim->width*F1_0 ) / bm->w;
+        dv = ( the_anim->height*F1_0 ) / bm->h;
+
+        for (h = 0; h < bm->h; h++) {
+          unsigned short *drow = &((unsigned short*)dptr)[bm->w * h];
+          unsigned short *srow = &((unsigned short*)sptr)[f2i(v)*the_anim->width];
+
+          utmp = u;
+
+          for (w = 0; w < bm->w; w++) {
+            *drow++ = srow[f2i(utmp)];
+            utmp += du;
+          }
+          v += dv;
+        }
+      }
+    } else {
+      // 1-to-1 mapping
+      memcpy(dptr, sptr, size);
+    }
+
+    bm_convert_format( first_frame+i, bm, bpp, flags );
+
+    // Skip a frame
+    if ( (i < nframes-1)  && can_drop_frames )  {
+      frame_data = anim_get_next_raw_buffer(the_anim_instance, 0, flags & BMP_AABITMAP ? 1 : 0, bm->bpp);
+    }
+  }
+
+  free_anim_instance(the_anim_instance);
+  anim_free(the_anim);
+}
+
+
+inline void bm_lock_user( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  // Unload any existing data
+  bm_free_data( bitmapnum, false );
+
+  if ((bpp != be->info.user.bpp) && !(flags & BMP_AABITMAP))
+    bpp = be->info.user.bpp;
+
+  switch ( bpp ) {
+    case 32:  // user 32-bit bitmap
+      bmp->bpp = bpp;
+      bmp->flags = be->info.user.flags;
+      bmp->data = (ptr_u)be->info.user.data;
+      break;
+
+    case 24:  // user 24-bit bitmap
+      bmp->bpp = bpp;
+      bmp->flags = be->info.user.flags;
+      bmp->data = (ptr_u)be->info.user.data;
+      break;
+
+    case 16:      // user 16 bit bitmap
+      bmp->bpp = bpp;
+      bmp->flags = be->info.user.flags;
+      bmp->data = (ptr_u)be->info.user.data;
+      break;
+
+    case 8:     // Going from 8 bpp to something (probably only for aabitmaps)
+      Assert(flags & BMP_AABITMAP);
+      bmp->bpp = bpp;
+      bmp->flags = be->info.user.flags;
+      bmp->data = (ptr_u)be->info.user.data;
+      break;
+
+     default:
+      Error( LOCATION, "Unhandled user bitmap conversion from %d to %d bpp", be->info.user.bpp, bmp->bpp );
+      break;
+  }
+
+  bm_convert_format( bitmapnum, bmp, bpp, flags );
+}
+
+inline void bm_lock_tga( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  ubyte *data = NULL;
+  int d_size, byte_size;
+  char filename[MAX_FILENAME_LEN];
+
+  // Unload any existing data
+  bm_free_data( bitmapnum, false );
+
+  if(Is_standalone){
+    Assert(bpp == 8);
+  }
+  else
+  {
+    Assert( (bpp == 16) || (bpp == 24 ) || (bpp == 32) );
+  }
+
+  // allocate bitmap data
+  byte_size = (bpp >> 3);
+
+  Assert( byte_size );
+  Assert( be->mem_taken > 0 );
+
+  data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
+
+  if (data) {
+    memset( data, 0, be->mem_taken);
+    d_size = byte_size;
+  } else {
+    return;
+  }
+
+  bmp->bpp = bpp;
+  bmp->data = (ptr_u)data;
+  bmp->palette = NULL;
+
+  Assert( &be->bm == bmp );
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+
+  int tga_error;
+
+  // make sure we are using the correct filename in the case of an EFF.
+  // this will populate filename[] whether it's EFF or not
+  EFF_FILENAME_CHECK;
+
+  tga_error = targa_read_bitmap( filename, data, NULL, d_size, be->dir_type);
+
+  if ( tga_error != TARGA_ERROR_NONE )  {
+    bm_free_data( bitmapnum, false );
+    return;
+  }
+
+  bmp->flags = 0;
+
+  bm_convert_format( bitmapnum, bmp, bpp, flags );
+}
+
+/**
+ * Lock a DDS file
+ */
+inline void bm_lock_dds( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  ubyte *data = NULL;
+  int error;
+  ubyte dds_bpp = 0;
+  char filename[MAX_FILENAME_LEN];
+
+  // free any existing data
+  bm_free_data( bitmapnum, false );
+
+  Assert( be->mem_taken > 0 );
+  Assert( &be->bm == bmp );
+
+  data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
+
+  if ( data == NULL )
+    return;
+
+  memset( data, 0, be->mem_taken );
+
+  // make sure we are using the correct filename in the case of an EFF.
+  // this will populate filename[] whether it's EFF or not
+  EFF_FILENAME_CHECK;
+
+  error = dds_read_bitmap( filename, data, &dds_bpp, be->dir_type );
+
+#if BYTE_ORDER == BIG_ENDIAN
+  // same as with TGA, we need to byte swap 16 & 32-bit, uncompressed, DDS images
+  if ( (be->comp_type == BM_TYPE_DDS) || (be->comp_type == BM_TYPE_CUBEMAP_DDS) ) {
+    unsigned int i = 0;
+
+    if (dds_bpp == 32) {
+      unsigned int *swap_tmp;
+
+      for (i = 0; i < (unsigned int)be->mem_taken; i += 4) {
+        swap_tmp = (unsigned int *)(data + i);
+        *swap_tmp = INTEL_INT(*swap_tmp);
+      }
+    } else if (dds_bpp == 16) {
+      unsigned short *swap_tmp;
+
+      for (i = 0; i < (unsigned int)be->mem_taken; i += 2) {
+        swap_tmp = (unsigned short *)(data + i);
+        *swap_tmp = INTEL_SHORT(*swap_tmp);
+      }
+    }
+  }
+#endif
+
+  bmp->bpp = dds_bpp;
+  bmp->data = (ptr_u)data;
+  bmp->flags = 0;
+
+  if (error != DDS_ERROR_NONE) {
+    bm_free_data( bitmapnum, false );
+    return;
+  }
+
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+}
+
+/**
+ * Lock a PNG file
+ */
+inline void bm_lock_png( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  ubyte *data = NULL;
+  //assume 32 bit - libpng should expand everything
+  int d_size;
+  int png_error = PNG_ERROR_INVALID;
+  char filename[MAX_FILENAME_LEN];
+
+  // Unload any existing data
+  bm_free_data( bitmapnum, false );
+
+  // allocate bitmap data
+  Assert( bmp->w * bmp->h > 0 );
+
+  //if it's not 32-bit, we expand when we read it
+  bmp->bpp = 32;
+  d_size = bmp->bpp >> 3;
+  //we waste memory if it turns out to be 24-bit, but the way this whole thing works is dodgy anyway
+  data = (ubyte*)bm_malloc(bitmapnum, bmp->w * bmp->h * d_size);
+  if (data == NULL)
+    return;
+  memset( data, 0, bmp->w * bmp->h * d_size);
+  bmp->data = (ptr_u)data;
+  bmp->palette = NULL;
+
+  Assert( &be->bm == bmp );
+
+  // make sure we are using the correct filename in the case of an EFF.
+  // this will populate filename[] whether it's EFF or not
+  EFF_FILENAME_CHECK;
+
+  //bmp->bpp gets set correctly in here after reading into memory
+  png_error = png_read_bitmap( filename, data, &bmp->bpp, d_size, be->dir_type );
+
+  if ( png_error != PNG_ERROR_NONE )  {
+    bm_free_data( bitmapnum, false );
+    return;
+  }
+
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+}
+
+/**
+ * Lock a JPEG file
+ */
+inline void bm_lock_jpg( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
+{
+  ubyte *data = NULL;
+  int d_size = 0;
+  int jpg_error = JPEG_ERROR_INVALID;
+  char filename[MAX_FILENAME_LEN];
+
+  // Unload any existing data
+  bm_free_data( bitmapnum, false );
+
+  d_size = (bpp >> 3);
+
+  // allocate bitmap data
+  Assert( be->mem_taken > 0 );
+  data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
+
+  if (data == NULL)
+    return;
+
+  memset( data, 0, be->mem_taken);
+
+  bmp->bpp = bpp;
+  bmp->data = (ptr_u)data;
+  bmp->palette = NULL;
+
+  Assert( &be->bm == bmp );
+
+  // make sure we are using the correct filename in the case of an EFF.
+  // this will populate filename[] whether it's EFF or not
+  EFF_FILENAME_CHECK;
+
+  jpg_error = jpeg_read_bitmap( filename, data, NULL, d_size, be->dir_type );
+
+  if ( jpg_error != JPEG_ERROR_NONE ) {
+    bm_free_data( bitmapnum, false );
+    return;
+  }
+
+#ifdef BMPMAN_NDEBUG
+  Assert( be->data_size > 0 );
+#endif
+}
+
 #endif // BMPMAN_INTERNAL
 
 #endif // __BM_INTERNAL_H__
Index: code/bmpman/bmpman.cpp
===================================================================
--- code/bmpman/bmpman.cpp (revision 10338)
+++ code/bmpman/bmpman.cpp (working copy)
@@ -109,7 +109,7 @@
 /**
  * Frees a bitmaps data if it can
  */
-static void bm_free_data(int n, bool release = false)
+void bm_free_data(int n, bool release = false)
 {
  bitmap_entry *be;
  bitmap *bmp;
@@ -225,7 +225,7 @@
 
 void bm_clean_slot(int n)
 {
- bm_free_data(n);
+ bm_free_data(n, false);
 }
 
 
@@ -265,7 +265,7 @@
  int i;
  if ( bm_inited ) {
  for (i=0; i<MAX_BITMAPS; i++ ) {
- bm_free_data(i); // clears flags, bbp, data, etc
+ bm_free_data(i, false); // clears flags, bbp, data, etc
  }
  bm_inited = 0;
  }
@@ -304,7 +304,7 @@
 
  gr_bm_init(i);
 
- bm_free_data(i);  // clears flags, bbp, data, etc
+ bm_free_data(i, false);  // clears flags, bbp, data, etc
  }
 }
 
@@ -1191,7 +1191,7 @@
  return bm_bitmaps[bitmapnum].type;
 }
 
-static void bm_convert_format( int bitmapnum, bitmap *bmp, ubyte bpp, ubyte flags )
+void bm_convert_format( int bitmapnum, bitmap *bmp, ubyte bpp, ubyte flags )
 {
  int idx;
 
@@ -1211,7 +1211,7 @@
 
  // maybe swizzle to be an xparent texture
  if(!(bmp->flags & BMP_TEX_XPARENT) && (flags & BMP_TEX_XPARENT)){
- for(idx=0; idx<bmp->w*bmp->h; idx++){
+ for(idx=0; idx<bmp->w*bmp->h; idx++){
  // if the pixel is transparent
  if ( ((ushort*)bmp->data)[idx] == Gr_t_green.mask) {
  ((ushort*)bmp->data)[idx] = 0;
@@ -1219,9 +1219,12 @@
  }
 
  bmp->flags |= BMP_TEX_XPARENT;
- }
+ }
 }
 
+MONITOR(NumBitmapPage)
+MONITOR(SizeBitmapPage)
+
 // basically, map the bitmap into the current palette. used to be done for all pcx's, now just for
 // Fred, since its the only thing that uses the software tmapper
 void bm_swizzle_8bit_for_fred(bitmap_entry *be, bitmap *bmp, ubyte *data, ubyte *palette)
@@ -1229,436 +1232,7 @@
 /* 2004/10/17 - taylor - no longer needed since FRED is OGL now*/
 }
 
-void bm_lock_pcx( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- ubyte *data;
- int pcx_error;
- char filename[MAX_FILENAME_LEN];
-
- // Unload any existing data
- bm_free_data( bitmapnum );
-
- be->mem_taken = (bmp->w * bmp->h * (bpp >> 3));
- data = (ubyte *)bm_malloc(bitmapnum, be->mem_taken);
- bmp->bpp = bpp;
- bmp->data = (ptr_u)data;
- bmp->palette = (bpp == 8) ? gr_palette : NULL;
- memset( data, 0, be->mem_taken );
-
- Assert( &be->bm == bmp );
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-
- // some sanity checks on flags
- Assert(!((flags & BMP_AABITMAP) && (flags & BMP_TEX_ANY))); // no aabitmap textures
-
- // make sure we are using the correct filename in the case of an EFF.
- // this will populate filename[] whether it's EFF or not
- EFF_FILENAME_CHECK;
-
- pcx_error = pcx_read_bitmap( filename, data, NULL, (bpp >> 3), (flags & BMP_AABITMAP), 0, be->dir_type );
-
- if ( pcx_error != PCX_ERROR_NONE ) {
- mprintf(("Couldn't load PCX!!! (%s)\n", filename));
- return;
- }
-
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-
- bmp->flags = 0;
-
- bm_convert_format( bitmapnum, bmp, bpp, flags );
-}
-
-void bm_lock_ani( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- anim *the_anim;
- anim_instance *the_anim_instance;
- bitmap *bm;
- ubyte *frame_data;
- int size, i;
- int first_frame, nframes;
-
- first_frame = be->info.ani.first_frame;
- nframes = bm_bitmaps[first_frame].info.ani.num_frames;
-
- if ( (the_anim = anim_load(bm_bitmaps[first_frame].filename, bm_bitmaps[first_frame].dir_type)) == NULL ) {
- nprintf(("BMPMAN", "Error opening %s in bm_lock\n", be->filename));
- return;
- }
-
- if ( (the_anim_instance = init_anim_instance(the_anim, bpp)) == NULL ) {
- nprintf(("BMPMAN", "Error opening %s in bm_lock\n", be->filename));
- anim_free(the_anim);
- return;
- }
-
- int can_drop_frames = 0;
-
- if ( the_anim->total_frames != bm_bitmaps[first_frame].info.ani.num_frames ) {
- can_drop_frames = 1;
- }
-
- bm = &bm_bitmaps[first_frame].bm;
- size = bm->w * bm->h * (bpp >> 3);
- be->mem_taken = size;
-
- Assert( size > 0 );
-
- for ( i=0; i<nframes; i++ ) {
- be = &bm_bitmaps[first_frame+i];
- bm = &bm_bitmaps[first_frame+i].bm;
-
- // Unload any existing data
- bm_free_data( first_frame+i );
-
- bm->flags = 0;
-
- // briefing editor in Fred2 uses aabitmaps (ani's) - force to 8 bit
- bm->bpp = Is_standalone ? (ubyte)8 : bpp;
-
- bm->data = (ptr_u)bm_malloc(first_frame + i, size);
-
- frame_data = anim_get_next_raw_buffer(the_anim_instance, 0 ,flags & BMP_AABITMAP ? 1 : 0, bm->bpp);
-
- ubyte *dptr, *sptr;
-
- sptr = frame_data;
- dptr = (ubyte *)bm->data;
-
- if ( (bm->w!=the_anim->width) || (bm->h!=the_anim->height) ) {
- // Scale it down
- // 8 bit
- if(bpp == 8){
- int w,h;
- fix u, utmp, v, du, dv;
-
- u = v = 0;
-
- du = ( the_anim->width*F1_0 ) / bm->w;
- dv = ( the_anim->height*F1_0 ) / bm->h;
-
- for (h = 0; h < bm->h; h++) {
- ubyte *drow = &dptr[bm->w * h];
- ubyte *srow = &sptr[f2i(v)*the_anim->width];
-
- utmp = u;
-
- for (w = 0; w < bm->w; w++) {
- *drow++ = srow[f2i(utmp)];
- utmp += du;
- }
- v += dv;
- }
- }
- // 16 bpp
- else {
- int w,h;
- fix u, utmp, v, du, dv;
-
- u = v = 0;
-
- du = ( the_anim->width*F1_0 ) / bm->w;
- dv = ( the_anim->height*F1_0 ) / bm->h;
-
- for (h = 0; h < bm->h; h++) {
- unsigned short *drow = &((unsigned short*)dptr)[bm->w * h];
- unsigned short *srow = &((unsigned short*)sptr)[f2i(v)*the_anim->width];
-
- utmp = u;
-
- for (w = 0; w < bm->w; w++) {
- *drow++ = srow[f2i(utmp)];
- utmp += du;
- }
- v += dv;
- }
- }
- } else {
- // 1-to-1 mapping
- memcpy(dptr, sptr, size);
- }
-
- bm_convert_format( first_frame+i, bm, bpp, flags );
-
- // Skip a frame
- if ( (i < nframes-1)  && can_drop_frames ) {
- frame_data = anim_get_next_raw_buffer(the_anim_instance, 0, flags & BMP_AABITMAP ? 1 : 0, bm->bpp);
- }
- }
-
- free_anim_instance(the_anim_instance);
- anim_free(the_anim);
-}
-
-
-void bm_lock_user( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- // Unload any existing data
- bm_free_data( bitmapnum );
-
- if ((bpp != be->info.user.bpp) && !(flags & BMP_AABITMAP))
- bpp = be->info.user.bpp;
-
- switch ( bpp ) {
- case 32: // user 32-bit bitmap
- bmp->bpp = bpp;
- bmp->flags = be->info.user.flags;
- bmp->data = (ptr_u)be->info.user.data;
- break;
-
- case 24: // user 24-bit bitmap
- bmp->bpp = bpp;
- bmp->flags = be->info.user.flags;
- bmp->data = (ptr_u)be->info.user.data;
- break;
-
- case 16: // user 16 bit bitmap
- bmp->bpp = bpp;
- bmp->flags = be->info.user.flags;
- bmp->data = (ptr_u)be->info.user.data;
- break;
-
- case 8: // Going from 8 bpp to something (probably only for aabitmaps)
- Assert(flags & BMP_AABITMAP);
- bmp->bpp = bpp;
- bmp->flags = be->info.user.flags;
- bmp->data = (ptr_u)be->info.user.data;
- break;
-
- default:
- Error( LOCATION, "Unhandled user bitmap conversion from %d to %d bpp", be->info.user.bpp, bmp->bpp );
- break;
- }
-
- bm_convert_format( bitmapnum, bmp, bpp, flags );
-}
-
-void bm_lock_tga( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- ubyte *data = NULL;
- int d_size, byte_size;
- char filename[MAX_FILENAME_LEN];
-
- // Unload any existing data
- bm_free_data( bitmapnum );
-
- if(Is_standalone){
- Assert(bpp == 8);
- }
- else
- {
- Assert( (bpp == 16) || (bpp == 24 ) || (bpp == 32) );
- }
-
- // allocate bitmap data
- byte_size = (bpp >> 3);
-
- Assert( byte_size );
- Assert( be->mem_taken > 0 );
-
- data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
-
- if (data) {
- memset( data, 0, be->mem_taken);
- d_size = byte_size;
- } else {
- return;
- }
-
- bmp->bpp = bpp;
- bmp->data = (ptr_u)data;
- bmp->palette = NULL;
-
- Assert( &be->bm == bmp );
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-
- int tga_error;
-
- // make sure we are using the correct filename in the case of an EFF.
- // this will populate filename[] whether it's EFF or not
- EFF_FILENAME_CHECK;
-
- tga_error = targa_read_bitmap( filename, data, NULL, d_size, be->dir_type);
-
- if ( tga_error != TARGA_ERROR_NONE ) {
- bm_free_data( bitmapnum );
- return;
- }
-
- bmp->flags = 0;
-
- bm_convert_format( bitmapnum, bmp, bpp, flags );
-}
-
 /**
- * Lock a DDS file
- */
-void bm_lock_dds( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- ubyte *data = NULL;
- int error;
- ubyte dds_bpp = 0;
- char filename[MAX_FILENAME_LEN];
-
- // free any existing data
- bm_free_data( bitmapnum );
-
- Assert( be->mem_taken > 0 );
- Assert( &be->bm == bmp );
-
- data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
-
- if ( data == NULL )
- return;
-
- memset( data, 0, be->mem_taken );
-
- // make sure we are using the correct filename in the case of an EFF.
- // this will populate filename[] whether it's EFF or not
- EFF_FILENAME_CHECK;
-
- error = dds_read_bitmap( filename, data, &dds_bpp, be->dir_type );
-
-#if BYTE_ORDER == BIG_ENDIAN
- // same as with TGA, we need to byte swap 16 & 32-bit, uncompressed, DDS images
- if ( (be->comp_type == BM_TYPE_DDS) || (be->comp_type == BM_TYPE_CUBEMAP_DDS) ) {
- unsigned int i = 0;
-
- if (dds_bpp == 32) {
- unsigned int *swap_tmp;
-
- for (i = 0; i < (unsigned int)be->mem_taken; i += 4) {
- swap_tmp = (unsigned int *)(data + i);
- *swap_tmp = INTEL_INT(*swap_tmp);
- }
- } else if (dds_bpp == 16) {
- unsigned short *swap_tmp;
-
- for (i = 0; i < (unsigned int)be->mem_taken; i += 2) {
- swap_tmp = (unsigned short *)(data + i);
- *swap_tmp = INTEL_SHORT(*swap_tmp);
- }
- }
- }
-#endif
-
- bmp->bpp = dds_bpp;
- bmp->data = (ptr_u)data;
- bmp->flags = 0;
-
- if (error != DDS_ERROR_NONE) {
- bm_free_data( bitmapnum );
- return;
- }
-
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-}
-
-/**
- * Lock a PNG file
- */
-void bm_lock_png( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- ubyte *data = NULL;
- //assume 32 bit - libpng should expand everything
- int d_size;
- int png_error = PNG_ERROR_INVALID;
- char filename[MAX_FILENAME_LEN];
-
- // Unload any existing data
- bm_free_data( bitmapnum );
-
- // allocate bitmap data
- Assert( bmp->w * bmp->h > 0 );
-
- //if it's not 32-bit, we expand when we read it
- bmp->bpp = 32;
- d_size = bmp->bpp >> 3;
- //we waste memory if it turns out to be 24-bit, but the way this whole thing works is dodgy anyway
- data = (ubyte*)bm_malloc(bitmapnum, bmp->w * bmp->h * d_size);
- if (data == NULL)
- return;
- memset( data, 0, bmp->w * bmp->h * d_size);
- bmp->data = (ptr_u)data;
- bmp->palette = NULL;
-
- Assert( &be->bm == bmp );
-
- // make sure we are using the correct filename in the case of an EFF.
- // this will populate filename[] whether it's EFF or not
- EFF_FILENAME_CHECK;
-
- //bmp->bpp gets set correctly in here after reading into memory
- png_error = png_read_bitmap( filename, data, &bmp->bpp, d_size, be->dir_type );
-
- if ( png_error != PNG_ERROR_NONE ) {
- bm_free_data( bitmapnum );
- return;
- }
-
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-}
-
-/**
- * Lock a JPEG file
- */
-void bm_lock_jpg( int handle, int bitmapnum, bitmap_entry *be, bitmap *bmp, ubyte bpp, ubyte flags )
-{
- ubyte *data = NULL;
- int d_size = 0;
- int jpg_error = JPEG_ERROR_INVALID;
- char filename[MAX_FILENAME_LEN];
-
- // Unload any existing data
- bm_free_data( bitmapnum );
-
- d_size = (bpp >> 3);
-
- // allocate bitmap data
- Assert( be->mem_taken > 0 );
- data = (ubyte*)bm_malloc(bitmapnum, be->mem_taken);
-
- if (data == NULL)
- return;
-
- memset( data, 0, be->mem_taken);
-
- bmp->bpp = bpp;
- bmp->data = (ptr_u)data;
- bmp->palette = NULL;
-
- Assert( &be->bm == bmp );
-
- // make sure we are using the correct filename in the case of an EFF.
- // this will populate filename[] whether it's EFF or not
- EFF_FILENAME_CHECK;
-
- jpg_error = jpeg_read_bitmap( filename, data, NULL, d_size, be->dir_type );
-
- if ( jpg_error != JPEG_ERROR_NONE ) {
- bm_free_data( bitmapnum );
- return;
- }
-
-#ifdef BMPMAN_NDEBUG
- Assert( be->data_size > 0 );
-#endif
-}
-
-MONITOR( NumBitmapPage )
-MONITOR( SizeBitmapPage )
-
-/**
  * This locks down a bitmap and returns a pointer to a bitmap that can be accessed until you call bm_unlock.
  *
  * Only lock a bitmap when you need it! This will convert it into the appropriate format also.
@@ -1995,7 +1569,7 @@
  } else {
  if(!nodebug)
  nprintf(("BmpMan", "Unloading %s.  %dx%dx%d\n", be->filename, bmp->w, bmp->h, bmp->bpp));
- bm_free_data(n); // clears flags, bbp, data, etc
+ bm_free_data(n, false); // clears flags, bbp, data, etc
  }
 
  return 1;
@@ -2075,7 +1649,7 @@
  int i;
  for (i = 0; i < MAX_BITMAPS; i++) {
  if ( bm_bitmaps[i].type != BM_TYPE_NONE ) {
- bm_free_data(i);
+ bm_free_data(i, false);
  }
  }
  dc_printf( "Total RAM after flush: %d bytes\n", bm_texture_ram );
Index: code/bmpman/bmpman.h
===================================================================
--- code/bmpman/bmpman.h (revision 10338)
+++ code/bmpman/bmpman.h (working copy)
@@ -285,4 +285,7 @@
 int bm_set_render_target(int handle, int face = -1);
 
 int bm_load_and_parse_eff(const char *filename, int dir_type, int *nframes, int *nfps, int *key, ubyte *type);
+
+void bm_free_data(int n, bool release);
+void bm_convert_format( int bitmapnum, bitmap *bmp, ubyte bpp, ubyte flags );
 #endif
Index: code/graphics/gropenglpostprocessing.cpp
===================================================================
--- code/graphics/gropenglpostprocessing.cpp (revision 10338)
+++ code/graphics/gropenglpostprocessing.cpp (working copy)
@@ -338,6 +338,8 @@
  GLboolean light = GL_state.Lighting(GL_FALSE);
  GLboolean blend = GL_state.Blend(GL_FALSE);
  GLboolean cull = GL_state.CullFace(GL_FALSE);
+  const char *name;
+  float value ;
 
  GL_state.Texture.SetShaderMode(GL_TRUE);
 
@@ -424,10 +426,10 @@
 
  for (size_t idx = 0; idx < Post_effects.size(); idx++) {
  if ( GL_post_shader[Post_active_shader_index].flags2 & (1<<idx) ) {
- const char *name = Post_effects[idx].uniform_name.c_str();
- float value = Post_effects[idx].intensity;
+ name = Post_effects[idx].uniform_name.c_str();
+ value = Post_effects[idx].intensity;
 
- vglUniform1fARB( opengl_shader_get_uniform(name), value);
+ vglUniform1fARB( (name == NULL) ? (-1) : opengl_shader_get_uniform(name), value);
  }
  }
 
Index: code/graphics/gropenglshader.cpp
===================================================================
--- code/graphics/gropenglshader.cpp (revision 10338)
+++ code/graphics/gropenglshader.cpp (working copy)
@@ -704,20 +704,22 @@
  */
 GLint opengl_shader_get_uniform(const char *uniform_text)
 {
- if ( (Current_shader == NULL) || (uniform_text == NULL) ) {
+ if ((Current_shader == NULL)) {
  Int3();
  return -1;
  }
 
  SCP_vector<opengl_shader_uniform_t>::iterator uniform;
- SCP_vector<opengl_shader_uniform_t>::iterator uniforms_end = Current_shader->uniforms.end();
-
- for (uniform = Current_shader->uniforms.begin(); uniform != uniforms_end; ++uniform) {
- if ( !uniform->text_id.compare(uniform_text) ) {
- return uniform->location;
- }
- }
 
+ for (uniform = Current_shader->uniforms.begin(); uniform != Current_shader->uniforms.end(); ++uniform) {
+    if(*(unsigned short *)uniform->text_id.c_str() == *(unsigned short *)uniform_text)
+    {
+      if ( !uniform->text_id.compare(uniform_text) ) {
+        return uniform->location;
+      }
+    }
+  }
+
  return -1;
 }
 
Index: code/math/fvi.cpp
===================================================================
--- code/math/fvi.cpp (revision 10338)
+++ code/math/fvi.cpp (working copy)
@@ -315,75 +315,6 @@
 }
 
 /**
- * Finds intersection of a ray and an axis-aligned bounding box
- *
- * Given a ray with origin at p0, and direction pdir, this function
- * returns non-zero if that ray intersects an axis-aligned bounding box
- * from min to max.   If there was an intersection, then hitpt will contain
- * the point where the ray begins inside the box.
- * Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
- */
-int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
-{
- bool inside = true;
- bool middle[3] = { true, true, true };
- int i;
- int which_plane;
- float maxt[3];
- float candidate_plane[3];
-
- for (i = 0; i < 3; i++) {
- if (p0->a1d[i] < min->a1d[i]) {
- candidate_plane[i] = min->a1d[i];
- middle[i] = false;
- inside = false;
- } else if (p0->a1d[i] > max->a1d[i]) {
- candidate_plane[i] = max->a1d[i];
- middle[i] = false;
- inside = false;
- }
- }
-
- // ray origin inside bounding box
- if ( inside ) {
- *hitpt = *p0;
- return 1;
- }
-
- // calculate T distances to candidate plane
- for (i = 0; i < 3; i++) {
- if ( !middle[i] && (pdir->a1d[i] != 0.0f) )
- maxt[i] = (candidate_plane[i] - p0->a1d[i]) / pdir->a1d[i];
- else
- maxt[i] = -1.0f;
- }
-
- // Get largest of the maxt's for final choice of intersection
- which_plane = 0;
- for (i = 1; i < 3; i++) {
- if (maxt[which_plane] < maxt[i])
- which_plane = i;
- }
-
- // check final candidate actually inside box
- if (maxt[which_plane] < 0.0f)
- return 0;
-
- for (i = 0; i < 3; i++) {
- if (which_plane != i) {
- hitpt->a1d[i] = p0->a1d[i] + maxt[which_plane] * pdir->a1d[i];
-
- if ( (hitpt->a1d[i] < min->a1d[i]) || (hitpt->a1d[i] > max->a1d[i]) )
- return 0;
- } else {
- hitpt->a1d[i] = candidate_plane[i];
- }
- }
-
- return 1;
-}
-
-/**
  * Given largest componant of normal, return i & j
  * If largest componant is negative, swap i & j
  */
Index: code/math/fvi.h
===================================================================
--- code/math/fvi.h (revision 10338)
+++ code/math/fvi.h (working copy)
@@ -104,8 +104,77 @@
 // from min to max.   If there was an intersection, then hitpt will contain
 // the point where the ray begins inside the box.
 // Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
-int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt );
+//int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt );
 
+/**
+ * Finds intersection of a ray and an axis-aligned bounding box
+ *
+ * Given a ray with origin at p0, and direction pdir, this function
+ * returns non-zero if that ray intersects an axis-aligned bounding box
+ * from min to max.   If there was an intersection, then hitpt will contain
+ * the point where the ray begins inside the box.
+ * Fast ray-box intersection taken from Graphics Gems I, pages 395,736.
+ */
+inline int fvi_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpt )
+{
+  bool inside = true;
+  bool middle[3] = { true, true, true };
+  int i;
+  int which_plane;
+  float maxt[3];
+  float candidate_plane[3];
+
+  for (i = 0; i < 3; i++) {
+    if (p0->a1d[i] < min->a1d[i]) {
+      candidate_plane[i] = min->a1d[i];
+      middle[i] = false;
+      inside = false;
+    } else if (p0->a1d[i] > max->a1d[i]) {
+      candidate_plane[i] = max->a1d[i];
+      middle[i] = false;
+      inside = false;
+    }
+  }
+
+  // ray origin inside bounding box
+  if ( inside ) {
+    *hitpt = *p0;
+    return 1;
+  }
+
+  // calculate T distances to candidate plane
+  for (i = 0; i < 3; i++) {
+    if ( !middle[i] && (pdir->a1d[i] != 0.0f) )
+      maxt[i] = (candidate_plane[i] - p0->a1d[i]) / pdir->a1d[i];
+    else
+      maxt[i] = -1.0f;
+  }
+
+  // Get largest of the maxt's for final choice of intersection
+  which_plane = 0;
+  for (i = 1; i < 3; i++) {
+    if (maxt[which_plane] < maxt[i])
+      which_plane = i;
+  }
+
+  // check final candidate actually inside box
+  if (maxt[which_plane] < 0.0f)
+    return 0;
+
+  for (i = 0; i < 3; i++) {
+    if (which_plane != i) {
+      hitpt->a1d[i] = p0->a1d[i] + maxt[which_plane] * pdir->a1d[i];
+
+      if ( (hitpt->a1d[i] < min->a1d[i]) || (hitpt->a1d[i] > max->a1d[i]) )
+        return 0;
+    } else {
+      hitpt->a1d[i] = candidate_plane[i];
+    }
+  }
+
+  return 1;
+}
+
 // sphere polygon collision prototypes
 
 // Given a polygon vertex list and a moving sphere, find the first contact the sphere makes with the edge, if any
Index: code/math/vecmat.cpp
===================================================================
--- code/math/vecmat.cpp (revision 10338)
+++ code/math/vecmat.cpp (working copy)
@@ -318,31 +318,9 @@
 }
 #endif
 
-//returns magnitude of a vector
-float vm_vec_mag(vec3d *v)
-{
- float x,y,z,mag1, mag2;
- x = v->xyz.x*v->xyz.x;
- y = v->xyz.y*v->xyz.y;
- z = v->xyz.z*v->xyz.z;
 
- mag1 = x+y+z;
 
- mag2 = fl_sqrt(mag1);
- return mag2;
-}
 
-//returns squared magnitude of a vector, useful if you want to compare distances
-float vm_vec_mag_squared(vec3d *v)
-{
- float x,y,z,mag1;
- x = v->xyz.x*v->xyz.x;
- y = v->xyz.y*v->xyz.y;
- z = v->xyz.z*v->xyz.z;
- mag1 = x+y+z;
- return mag1;
-}
-
 float vm_vec_dist_squared(vec3d *v0, vec3d *v1)
 {
  float dx, dy, dz;
Index: code/math/vecmat.h
===================================================================
--- code/math/vecmat.h (revision 10338)
+++ code/math/vecmat.h (working copy)
@@ -15,7 +15,7 @@
 #include <float.h>
 #include "globalincs/pstypes.h"
 
-//#define _INLINE_VECMAT
+#define _INLINE_VECMAT
 
 #define vm_is_vec_nan(v) (_isnan((v)->xyz.x) || _isnan((v)->xyz.y) || _isnan((v)->xyz.z))
 
@@ -225,10 +225,33 @@
 void vm_vec_projection_onto_plane (vec3d *projection, vec3d *src, vec3d *normal);
 
 //returns magnitude of a vector
-float vm_vec_mag(vec3d *v);
+//float vm_vec_mag(vec3d *v);
+//returns magnitude of a vector
+inline float vm_vec_mag(vec3d *v)
+{
+  float x,y,z,mag1, mag2;
+  x = v->xyz.x*v->xyz.x;
+  y = v->xyz.y*v->xyz.y;
+  z = v->xyz.z*v->xyz.z;
 
+  mag1 = x+y+z;
+
+  mag2 = fl_sqrt(mag1);
+  return mag2;
+}
+
 // returns the square of the magnitude of a vector (useful if comparing distances)
-float vm_vec_mag_squared(vec3d* v);
+//float vm_vec_mag_squared(vec3d* v);
+//returns squared magnitude of a vector, useful if you want to compare distances
+inline float vm_vec_mag_squared(vec3d *v)
+{
+  float x,y,z,mag1;
+  x = v->xyz.x*v->xyz.x;
+  y = v->xyz.y*v->xyz.y;
+  z = v->xyz.z*v->xyz.z;
+  mag1 = x+y+z;
+  return mag1;
+}
 
 // returns the square of the distance between two points (fast and exact)
 float vm_vec_dist_squared(vec3d *v0, vec3d *v1);
Index: code/model/modelcollide.cpp
===================================================================
--- code/model/modelcollide.cpp (revision 10338)
+++ code/model/modelcollide.cpp (working copy)
@@ -76,7 +76,7 @@
 // Returns non-zero if vector from p0 to pdir
 // intersects the bounding box.
 // hitpos could be NULL, so don't fill it if it is.
-int mc_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpos )
+inline int mc_ray_boundingbox( vec3d *min, vec3d *max, vec3d * p0, vec3d *pdir, vec3d *hitpos )
 {
 
  vec3d tmp_hitpos;
@@ -481,7 +481,7 @@
 
 int model_collide_sub( void *model_ptr );
 
-void model_collide_sortnorm(ubyte * p)
+inline void model_collide_sortnorm(ubyte * p)
 {
  int frontlist = w(p+36);
  int backlist = w(p+40);
@@ -509,7 +509,7 @@
 
 //calls the object interpreter to render an object.  The object renderer
 //is really a seperate pipeline. returns true if drew
-int model_collide_sub(void *model_ptr )
+inline int model_collide_sub(void *model_ptr )
 {
  ubyte *p = (ubyte *)model_ptr;
  int chunk_type, chunk_size;
@@ -602,7 +602,7 @@
  }
 }
 
-void model_collide_bsp(bsp_collision_tree *tree, int node_index)
+inline void model_collide_bsp(bsp_collision_tree *tree, int node_index)
 {
  if ( tree->node_list == NULL || tree->n_verts <= 0) {
  return;
@@ -949,7 +949,7 @@
  return false;
 }
 
-bool mc_check_sldc(int offset)
+inline bool mc_check_sldc(int offset)
 {
  if (offset > Mc_pm->sldc_size-5) //no way is this big enough
  return false;
@@ -999,7 +999,7 @@
 }
 
 // checks a vector collision against a ships shield (if it has shield points defined).
-void mc_check_shield()
+inline void mc_check_shield()
 {
  int i;
 
@@ -1031,7 +1031,7 @@
 
 // This function recursively checks a submodel and its children
 // for a collision with a vector.
-void mc_check_subobj( int mn )
+inline void mc_check_subobj( int mn )
 {
  vec3d tempv;
  vec3d hitpt; // used in bounding box check
Index: code/object/objcollide.cpp
===================================================================
--- code/object/objcollide.cpp (revision 10338)
+++ code/object/objcollide.cpp (working copy)
@@ -1287,53 +1287,6 @@
  overlapped = true;
 }
 
-float obj_get_collider_endpoint(int obj_num, int axis, bool min)
-{
- if ( Objects[obj_num].type == OBJ_BEAM ) {
- beam *b = &Beams[Objects[obj_num].instance];
-
- // use the last start and last shot as endpoints
- float min_end, max_end;
- if ( b->last_start.a1d[axis] > b->last_shot.a1d[axis] ) {
- min_end = b->last_shot.a1d[axis];
- max_end = b->last_start.a1d[axis];
- } else {
- min_end = b->last_start.a1d[axis];
- max_end = b->last_shot.a1d[axis];
- }
-
- if ( min ) {
- return min_end;
- } else {
- return max_end;
- }
- } else if ( Objects[obj_num].type == OBJ_WEAPON ) {
- float min_end, max_end;
-
- if ( Objects[obj_num].pos.a1d[axis] > Objects[obj_num].last_pos.a1d[axis] ) {
- min_end = Objects[obj_num].last_pos.a1d[axis];
- max_end = Objects[obj_num].pos.a1d[axis];
- } else {
- min_end = Objects[obj_num].pos.a1d[axis];
- max_end = Objects[obj_num].last_pos.a1d[axis];
- }
-
- if ( min ) {
- return min_end - Objects[obj_num].radius;
- } else {
- return max_end + Objects[obj_num].radius;
- }
- } else {
- vec3d *pos = &Objects[obj_num].pos;
-
- if ( min ) {
- return pos->a1d[axis] - Objects[obj_num].radius;
- } else {
- return pos->a1d[axis] + Objects[obj_num].radius;
- }
- }
-}
-
 void obj_quicksort_colliders(SCP_vector<int> *list, int left, int right, int axis)
 {
  Assert( axis >= 0 );
Index: code/object/objcollide.h
===================================================================
--- code/object/objcollide.h (revision 10338)
+++ code/object/objcollide.h (working copy)
@@ -13,6 +13,7 @@
 #define _COLLIDESTUFF_H
 
 #include "globalincs/pstypes.h"
+#include "weapon/beam.h"
 
 class object;
 struct CFILE;
@@ -155,4 +156,54 @@
 int reject_due_collision_groups(object *A, object *B);
 
 void init_collision_info_struct(collision_info_struct *cis);
+
+void obj_collide_pair(object *A, object *B);
+
+inline float obj_get_collider_endpoint(int obj_num, int axis, bool min)
+{
+  if ( Objects[obj_num].type == OBJ_BEAM ) {
+    beam *b = &Beams[Objects[obj_num].instance];
+
+    // use the last start and last shot as endpoints
+    float min_end, max_end;
+    if ( b->last_start.a1d[axis] > b->last_shot.a1d[axis] ) {
+      min_end = b->last_shot.a1d[axis];
+      max_end = b->last_start.a1d[axis];
+    } else {
+      min_end = b->last_start.a1d[axis];
+      max_end = b->last_shot.a1d[axis];
+    }
+
+    if ( min ) {
+      return min_end;
+    } else {
+      return max_end;
+    }
+  } else if ( Objects[obj_num].type == OBJ_WEAPON ) {
+    float min_end, max_end;
+
+    if ( Objects[obj_num].pos.a1d[axis] > Objects[obj_num].last_pos.a1d[axis] ) {
+      min_end = Objects[obj_num].last_pos.a1d[axis];
+      max_end = Objects[obj_num].pos.a1d[axis];
+    } else {
+      min_end = Objects[obj_num].pos.a1d[axis];
+      max_end = Objects[obj_num].last_pos.a1d[axis];
+    }
+
+    if ( min ) {
+      return min_end - Objects[obj_num].radius;
+    } else {
+      return max_end + Objects[obj_num].radius;
+    }
+  } else {
+    vec3d *pos = &Objects[obj_num].pos;
+
+    if ( min ) {
+      return pos->a1d[axis] - Objects[obj_num].radius;
+    } else {
+      return pos->a1d[axis] + Objects[obj_num].radius;
+    }
+  }
+}
+
 #endif
Index: code/object/object.cpp
===================================================================
--- code/object/object.cpp (revision 10338)
+++ code/object/object.cpp (working copy)
@@ -1645,33 +1645,6 @@
  }
 }
 
-/**
- * Do client-side post-interpolation object movement
- */
-void obj_client_post_interpolate()
-{
- object *objp;
-
- // After all objects have been moved, move all docked objects.
- objp = GET_FIRST(&obj_used_list);
- while( objp !=END_OF_LIST(&obj_used_list) ) {
- if ( objp != Player_obj ) {
- dock_move_docked_objects(objp);
- }
- objp = GET_NEXT(objp);
- }
-
- // check collisions
- if ( Cmdline_old_collision_sys ) {
- obj_check_all_collisions();
- } else {
- obj_sort_and_collide();
- }
-
- // do post-collision stuff for beam weapons
- beam_move_all_post();
-}
-
 void obj_observer_move(float frame_time)
 {
  object *objp;
Index: code/object/object.h
===================================================================
--- code/object/object.h (revision 10338)
+++ code/object/object.h (working copy)
@@ -336,4 +336,8 @@
 int obj_get_by_signature(int sig);
 int object_get_model(object *objp);
 
+void obj_move_all_pre(object *objp, float frametime);
+
+void obj_check_object( object *obj );
+
 #endif
Index: code/osapi/osapi_unix.cpp
===================================================================
--- code/osapi/osapi_unix.cpp (revision 10338)
+++ code/osapi/osapi_unix.cpp (working copy)
@@ -14,6 +14,9 @@
 #include <stdio.h>
 #include <fcntl.h>
 #include <stdarg.h>
+#ifdef __linux__
+#include <execinfo.h>
+#endif
 
 #include "globalincs/pstypes.h"
 #include "io/key.h"
@@ -237,8 +240,30 @@
 
 void debug_int3(char *file, int line)
 {
- mprintf(("Int3(): From %s at line %d\n", file, line));
+#ifndef NDEBUG
+#ifdef __linux__
+#define SIZE 1024
+  char **symbols;
+  int i, numstrings;
+  void *buffer[SIZE];
+#endif
+#endif
+  mprintf(("Int3(): From %s at line %d\n", file, line));
 
+#ifndef NDEBUG
+#ifdef __linux__
+ numstrings = backtrace(buffer, SIZE);
+ symbols = backtrace_symbols(buffer, numstrings);
+ if(symbols != NULL)
+ {
+   for(i = 0; i < numstrings; i++)
+   {
+     mprintf(("%s\n", symbols[i]));
+   }
+ }
+ free(symbols);
+#endif
+#endif
  // we have to call os_deinit() before abort() so we make sure that SDL gets
  // closed out and we don't lose video/input control
  os_deinit();
Index: configure.ac
===================================================================
--- configure.ac (revision 10338)
+++ configure.ac (working copy)
@@ -219,7 +219,7 @@
 if test "$fs2_debug" = "yes" ; then
  AC_DEFINE([_DEBUG])
  D_CFLAGS="$D_CFLAGS -O0 -g -Wall -Wextra -Wno-unused-parameter -Wno-write-strings -Wshadow -funroll-loops"
- D_LDFLAGS="$D_LDFLAGS -g"
+ D_LDFLAGS="$D_LDFLAGS -g -rdynamic"
 
  if test "$fs2_fred" = "yes" ; then
  AC_SUBST(FS2_BINARY, ["wxFRED2_${PACKAGE_VERSION}_DEBUG"])

[attachment deleted by an evil time traveler]
Title: Re: Performance Improvements (Updated 25/01/14)
Post by: m!m on January 25, 2014, 03:41:37 am
I did some performance testing with the WiH opening cutscene:
(http://i.imgur.com/66YqU6a.png)
Title: Re: Performance Improvements (Updated 25/01/14)
Post by: Flaming_Sword on June 21, 2014, 11:38:54 pm
 :bump:

Bumpage for update to latest trunk, and windows executables since I can build them again.