Hard Light Productions Forums

Modding, Mission Design, and Coding => FS2 Open Coding - The Source Code Project (SCP) => Topic started by: AdmiralRalwood on July 04, 2016, 09:12:43 am

Title: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: AdmiralRalwood on July 04, 2016, 09:12:43 am
This is the tale of change-ship-class and how much I hate love it.

Once upon a time, DefCynodont119 was making a mission involving the change-ship-class SEXP.

Actually, that's too recent. Let's go back in time a bit further.

Once upon a time, Goober5000 found a fascinating function called "change_ship_type()" and decided to make a SEXP that called it.

Actually, that's still too recent. Let's go back a bit further.

Once upon a time, some coder(s) at Volition wrote a function named "change_ship_type()", to be used whenever a ship needed to change types. Fairly straightforward, right? This was for things like FRED (where you can change the class of a ship in the ship editor) and the loadout screen (where you can change the class of your and your wingmates' ships). Somewhat important about these things is that they don't take place while a mission is actively ongoing; that will turn out to bite us in the ass later.

So, fast-forward. Descent: FreeSpace is released... you know what? Let's fast-forward a bit further than that. Goober5000 sees "change_ship_type()" and presumably imagines the possibilities exposing such a function to FREDers could create. Anyone who's seen Axem's work probably knows that the possibilities are, to put it succinctly, really cool. So the change-ship-class SEXP is created.

This has made a lot of people very angry and been widely regarded as a bad move. This made a lot of people very happy, in the years to come, but was plagued with problems; in that very commit, Goober apparently noticed that change_ship_type() would reset the ship's shields and hull, which is not particularly desired in most use-cases, although didn't take the time to fix it properly at first (remember that this was January of 2003 and this was revision 278(!) of a repository that hit over 11,000 revisions before we made the switch to Git, over twelve years later). Over time, lots of other issues would be noticed and fixed (or attempted to be fixed, and then actually fixed later, as is often the case with FSO), like making it handle countermeasure counts, afterburner fuel, maximum speed as adjusted by engine power, and so on and so forth.

One particular issue lurked, however, unfixed all the while.

This takes us back to where we (almost) started: DefCynodont119 was making a mission involving the change-ship-class SEXP. But what's this?

I need someone to help us Identify the cause of a highly complicated CTD Error/Bug:

I am having problems with the Change-Ship-Class SEXP

Spontaneous crashes without an FSO error message! Debug log not showing any helpful information! Enter an apparently-bored coder who clearly had nothing better to do with his time:

Can you provide a test mission to recreate this crash?

And so the debugging process began. It was quite simple at first; the executable crashed almost immediately, and the debugger was quite clear about what the problem was: the player was targeting a subsystem from the version of the ship before change_ship_type() had been called, which was pointing to a submodel number that did not exist on the new model, causing an out-of-bounds array access on the polymodel's submodel list, resulting in FSO attempting to interpret what appeared to be a segment of shader code as though it were the data for a submodel, leading to nonsensical data which ultimately caused a crash when invalid values were sorted (or, to get technical, the executable attempted to sort them, discovered a non-comparable value, and threw an exception).

"Easily fixed," said I, and proceeded to quickly write some code in change_ship_type() to handle the player targeting a subsystem on the ship changing classes.

I found the problem

Ah hah hah hah hah hah... "the problem," he says. Look at this naïve idiot. Anyway, this would turn out to only be the beginning of the work to be done on change_ship_type(). As it turned out, the next problem to be solved was also fairly straightforward:

OK, so I messed around with a variation of the test mission, only this time with the workarounds and this happened:

Code: [Select]
Assert: pair2.dist >= 0
File: shiphit.cpp
Line: 1088

You see, sometimes when ships get hit, they trail "spark" particles behind them. These are stored as, basically, a submodel and an offset for the sparks to emit from. When these sparks referred to a submodel that no longer existed... well, let's just say that despite the differences in these two errors (one throwing an exception and one triggering an assertion), they both boiled down to the same issue. At first, I thought I'd have to do some complicated trickery to find out which submodels were referenced by which subsystems, and then try to transfer the spark data... and then it occurred to me that any change in the model renders the offset coordinates more-or-less completely useless even if you could find equivalent subsystems, so I went with the far simpler solution of simply clearing all spark data.

That was not the end of it. No, not the end of it by far.

You see, the AI can also target subsystems. And that targeting is also handled by pointers, just like the player's targeting (in point of fact, the player's target is actually handled by an AI object, so they're exactly the same). So the code I'd written to handle the player's target would need to be generalized to handle every AI, or you get the exact same problem from non-player fighters. Right as I finish fixing that issue, I smack my forehead as I realize homing weapons also have a subsystem pointer, pointing towards the subsystem they're homing in on.

The list of things change_ship_type() needs to iterate over in order to fix out-of-date subsystem references keeps growing longer, and yet the mission doesn't stop crashing. "What's crashing this time? An array of the last subsystem each player targeted on each ship? Agh! Okay, let's fix that, too." Still crashing. "What. EMP missiles now? That doesn't even make sense! The AI's EMP target-scrambling doesn't target subsystems!" And yet, as intermittent as the crash was, it was still crashing. "Okay," said I, "give me the version of the test mission you're using and I'll do my best to make it crash."

Well, I did. And it did. And, to my considerable annoyance/perplexion... it was crashing due to an issue I thought had already been fixed.

Quote
<MageKing17> Okay, this is ****in' weird.
<MageKing17> I went through all that trouble to fix the AI targeting invalid subsystems in change_ship_type(), and then get a crash because... an AI is targeting an invalid subsystem!

It got weirder.

Quote
<MageKing17> I'm so confused.
<MageKing17> This ship has no goals to target a subsystem.
<MageKing17> How is it targeting a subsystem?

I scoured the code, looking for every reference I could find to ai_info.targeted_subsys being set; it pretty much only happened in set_targeted_subsys(), which was usually called with NULL (meaning "clear any subsystem target"). One of the very few places it was called with an actual subsystem was in ai_set_attack_subsystem(), which... is called in exactly one place, if the AI has a "destroy subsystem", "disable", or "disarm" goal.

Quote
<MageKing17> what the ****
<MageKing17> there should be no reason for a non-turret AI to ever be targeting a subsystem without having a goal to destroy a subsystem (or disable/disarm)

I was baffled. I was tearing my hear out (no, really; it hurt like hell). As far as I could tell, this scenario could not be happening, and yet it was clearly happening, or my debugger wouldn't be showing me this AI targeting a nonexistant turret.

But just then...

Quote
<MageKing17> oh wait

...I finally noticed...

Quote
<MageKing17> what's this function over here that looks like the exact problem

...that there was this one other place that called set_targeted_subsys() with a subsystem instead of a null pointer. Specifically, ai_big_strafe_maybe_attack_turret().

Quote
* MageKing17 facepalms

"What does ai_big_strafe_maybe_attack_turret() do?" you ask? Well, it's called when an AI either tries to evade a projectile, or gets hit by one while strafing a capital ship. It makes the AI direct its attention towards the turret that fired said projectile (depending somewhat on AI class), on the reasoning that taking out turrets that are shooting at it would help it to accomplish its other tasks without blowing up in the process. "How does it know which turret fired the projectile?" you might be wondering. Well, I'm glad you asked!

As it happens, it's handled by a pointer to the firing subsystem, stored on the projectile.

:banghead:

So, now we need to iterate not just through homing weapons, but every single weapon in the game looking for both homing subsystems and the turrets that (may have) fired them. However, with that fixed I couldn't get the test mission to crash any more, so hopefully (fingers crossed) this bug is finally fixed (although the PR is still waiting on code review (https://github.com/scp-fs2open/fs2open.github.com/pull/634) as of this posting merged now).

The End? Oh please oh please be the end of this
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: General Battuta on July 04, 2016, 09:27:42 am
Holy ****
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: Spoon on July 04, 2016, 10:10:43 am
So basically Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.
Codethulhu attempts to drain the sanity of yet another poor coder.

Interesting read, also very relatable.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: MatthTheGeek on July 04, 2016, 10:17:57 am
Spaghetti code debugging at its finest.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: General Battuta on July 04, 2016, 10:58:17 am
It's funny how when a thing breaks, and you check all the possible ways it could've broken, and then figure out a few more ways it could've broken, it's always the thing you didn't quite check.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: AdmiralRalwood on July 04, 2016, 11:02:25 am
It's funny how when a thing breaks, and you check all the possible ways it could've broken, and then figure out a few more ways it could've broken, it's always the thing you didn't quite check.
I tried so hard to search the code for any non-temporary subsystem pointers, too; there's just so much code to sort through that it's all too easy to miss something vitally important.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: Mongoose on July 04, 2016, 12:42:54 pm
I feel bad for loving these sorts of stories as much as I do, because I know there must be a pile of pulled-out hair sitting next to the keyboard associated with them.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: 666maslo666 on July 04, 2016, 12:55:02 pm
At least it crashed. I think the worst bugs are logic bugs where debugger shows nothing but there is something subtly wrong. I am just an amateur gamedev and one of those bad boys made me contemplate giving up on my hobby and go collect stamps instead: all my meshes were rotating correctly except for simple quads, which were rotating, but not always as they should. After lots of trying, I gave up on that bug as unfixable and kinda worked around it if needed. Months later, I randomly noticed a possible bug in model loading code, and after fixing it, my quads were suddenly rotating as they should. Turns out the quad model geometry (and only the quad model geometry) failed to load. But the model was still valid and could be rendered as a quad (two triangles, six vertices), except that it took those vertices from the model in the buffer that was next in line. Which just happened to be a cube. And when you render only six vertices of a cube, you get a quad. A quad with offset model center, but a still quad nonetheless. So all this time, I was using top side of a cube as a quad without knowing it..

And then I realize that FS codebase is 100x larger than my amateur engine and written by many people..  :eek2:
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: DefCynodont119 on July 04, 2016, 03:47:34 pm
This, This Is the best post ever, Thank you.

Just To illustrate how elusive the ai_big_strafe_maybe_attack_turret() crash was:

After 3+ hours of the testing over and over, I was only able to trigger two errors. One was an unexplained CTD, and the other was a bizarre case where: 1, after 10-15 seconds of doing the thing that used to make it crash the sound started cutting in and out, 2, then the weapon/engine effects flickered then disappeared, 3, the whole screen just went black. this happened over the course of about 1.5 minuets.   :shaking: unfortunately i was not in Debug when that happened.


However; most of my efforts to wreck the game caused nothing.
I can no longer reproduce the crash with more then 7ish% chance of it happening as far as I can tell.

That was me giving up on just reproducing the error when AdmiralRalwood pushed on, venturing deep into the temple of code, risking sanity and injury to save the world from entire legions of crashes that outwardly look the same.

Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: niffiwan on July 04, 2016, 05:46:39 pm
Thank you for a very entertaining read sir

/me tips his hat
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: AdmiralRalwood on July 04, 2016, 07:39:45 pm
That was me giving up on just reproducing the error when AdmiralRalwood pushed on, venturing deep into the temple of code, risking sanity and injury to save the world from entire legions of crashes that outwardly look the same.
entire legions of crashes
legion
I see what you did there.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: Goober5000 on July 05, 2016, 12:58:02 am
Awesome read.  Awesome bugfix.

AdmiralRalwood, can I commission you to write the story of my subsystem-look-at refactor?
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: AdmiralRalwood on July 05, 2016, 01:13:39 am
AdmiralRalwood, can I commission you to write the story of my subsystem-look-at refactor?
I'm flattered, but I doubt I could do it justice when I wasn't the one working on it... which, unfortunately, I don't understand the vector math either.
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: Phantom Hoover on July 05, 2016, 08:46:53 am
This wouldn't have happened if you'd used Rust!!!
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: niffiwan on July 05, 2016, 04:35:09 pm
you mean that FSO wouldn't have happened? :p
Title: Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Post by: AdmiralRalwood on July 06, 2016, 05:17:53 am
First nightly build with this fix (as well as APNG support) is here (http://www.hard-light.net/forums/index.php?topic=92196.0).