Off-Topic Discussion > The Classics

Epic Bughunt Claims Sanity, Human Blood

<< < (2/15) > >>

The E:
The Bug From Hell, or: Is it 6AM again?

Here I am, explaining the technical bits.

And bringing you a shot of the bug in action:


The first inkling I had that something was wrong was a small Assertion while running the buggy mission. Now, usually, Assertions are good things to have when debugging, it means that a) there's a clear starting point, and b) another developer had the foresight to check his data and provide a warning in case it was bogus.
So, with the assertion in hand, I dug in. For those of you interested, the assertion was "im > 0.0f" from vecmat.cpp, line 543. This info, by itself, is useless, since the function that the game is processing at that time (vm_vec_normalize()) is called from literally dozens of places.
However, since I happen to have Visual Studio (Microsoft's developer environment), I was able to perform what amounts to a live dissection of the crashed program, yielding the ever-important callstack. This told me the state of the engine at the time of the crash.

What I saw, however, was extremely puzzling. The callstack pointed towards the function that is responsible for generating the three-dimensional sound environment1 , which needs the current camera position as a starting point.
In this case, the global variable that stores the camera's position was apparently corrupted. This is something that should never, ever happen, and I was astonished to see that this was the first visible error message thrown.

So, I did what a developer debugging a large codebase does in these cases. I sprinkled breakpoints2 everywhere I thought they were needed, and ran the mission again.
And this is basically what I did the next few hours. Run the game, inspect the data at the breakpoints, reset breakpoints, run game again, lather, rinse, repeat.
Gradually, through trial and error, it was revealed that we were dealing with a cascading failure.
One that was literally happening in the tiny space between two frames.
As far as I could determine, what happened was that at some point, the player's throttle setting got invalid. When this invalid throttle setting was fed into the physics engine, it tried to calculate the player's velocity based on that, which again yielded invalid results. Then, in the next frame, the physics engine tried to calculate the player's new position using the invalid speed data, which produced coordinates that lay somewhere outside of the known universe.
This data, in turn, was used by the rendering engine as input, which caused it to basically throw up its arms and do as it was told (the mental image I had was of a tired office worker, who basically says "If management wants me to do something stupid, I'll do something stupid").
And on and on, until that assertion in a part of the code that was so far removed from the actual cause as possible.

Which brings me to the cause.
I can't take credit for discovering it, IssMneur was the one who nailed it by stumbling across the strange fact that if the automatic speed matching was turned off, the bug didn't happen.
Coupled with Battutas' discovery that the difference between the bug appearing and it staying hidden was a repair-subsystem sexp, he investigated, tracing through the code until he found the cause of this error.
Now, what happened was:

1. Player's engines get sabotaged via sabotage-subsystem sexp
2. Seeing that the player no longer has engines, the game sets the maximum speed the player can reach to 0.

This is normal so far.
However, if the player happens to be matching speed with another ship, the game tries to calculate the throttle setting necessary to do that by dividing the target's speed by the player's current max speed.

Feel free to headdesk now. I know I did. Because the code never checked to see if the divisor was greater than zero, we had a classical error on our hands.

End result, after countless hours of staring at variables? Two new lines of code that should bury this beast for good.


1: Basically, the game tries to place each sound source that is active in a 3d environment around the currently active camera, and adjust the sound effects' volume accordingly.
2: A Breakpoint is a marker that enables the debugger (NOT the debug build, I'm talking about the Visual Studio debugger here) to automatically halt execution. They are invaluable when trying to analyze a running program.

mjn.mixael:
I'm astonished. Truly. I would have given up hours and hours ago.

Maybe some sort of division error checking should be added to FSO... I'm not a coder so I don't even know if that is possible or plausible.

Commander Zane:
That **** sounds intense, nice job on cracking the problem.

jr2:
Awesome work, guys!  :nod:

Droid803:
Oh, on a somewhat unrelated note, is this issue connected to the one where sometimes, upon re-trying a mission, the player ship is...gone?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version