Author Topic: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom  (Read 3827 times)

0 Members and 1 Guest are viewing this topic.

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
This is the tale of change-ship-class and how much I hate love it.

Once upon a time, DefCynodont119 was making a mission involving the change-ship-class SEXP.

Actually, that's too recent. Let's go back in time a bit further.

Once upon a time, Goober5000 found a fascinating function called "change_ship_type()" and decided to make a SEXP that called it.

Actually, that's still too recent. Let's go back a bit further.

Once upon a time, some coder(s) at Volition wrote a function named "change_ship_type()", to be used whenever a ship needed to change types. Fairly straightforward, right? This was for things like FRED (where you can change the class of a ship in the ship editor) and the loadout screen (where you can change the class of your and your wingmates' ships). Somewhat important about these things is that they don't take place while a mission is actively ongoing; that will turn out to bite us in the ass later.

So, fast-forward. Descent: FreeSpace is released... you know what? Let's fast-forward a bit further than that. Goober5000 sees "change_ship_type()" and presumably imagines the possibilities exposing such a function to FREDers could create. Anyone who's seen Axem's work probably knows that the possibilities are, to put it succinctly, really cool. So the change-ship-class SEXP is created.

This has made a lot of people very angry and been widely regarded as a bad move. This made a lot of people very happy, in the years to come, but was plagued with problems; in that very commit, Goober apparently noticed that change_ship_type() would reset the ship's shields and hull, which is not particularly desired in most use-cases, although didn't take the time to fix it properly at first (remember that this was January of 2003 and this was revision 278(!) of a repository that hit over 11,000 revisions before we made the switch to Git, over twelve years later). Over time, lots of other issues would be noticed and fixed (or attempted to be fixed, and then actually fixed later, as is often the case with FSO), like making it handle countermeasure counts, afterburner fuel, maximum speed as adjusted by engine power, and so on and so forth.

One particular issue lurked, however, unfixed all the while.

This takes us back to where we (almost) started: DefCynodont119 was making a mission involving the change-ship-class SEXP. But what's this?

I need someone to help us Identify the cause of a highly complicated CTD Error/Bug:

I am having problems with the Change-Ship-Class SEXP

Spontaneous crashes without an FSO error message! Debug log not showing any helpful information! Enter an apparently-bored coder who clearly had nothing better to do with his time:

Can you provide a test mission to recreate this crash?

And so the debugging process began. It was quite simple at first; the executable crashed almost immediately, and the debugger was quite clear about what the problem was: the player was targeting a subsystem from the version of the ship before change_ship_type() had been called, which was pointing to a submodel number that did not exist on the new model, causing an out-of-bounds array access on the polymodel's submodel list, resulting in FSO attempting to interpret what appeared to be a segment of shader code as though it were the data for a submodel, leading to nonsensical data which ultimately caused a crash when invalid values were sorted (or, to get technical, the executable attempted to sort them, discovered a non-comparable value, and threw an exception).

"Easily fixed," said I, and proceeded to quickly write some code in change_ship_type() to handle the player targeting a subsystem on the ship changing classes.

I found the problem

Ah hah hah hah hah hah... "the problem," he says. Look at this naïve idiot. Anyway, this would turn out to only be the beginning of the work to be done on change_ship_type(). As it turned out, the next problem to be solved was also fairly straightforward:

OK, so I messed around with a variation of the test mission, only this time with the workarounds and this happened:

Code: [Select]
Assert: pair2.dist >= 0
File: shiphit.cpp
Line: 1088

You see, sometimes when ships get hit, they trail "spark" particles behind them. These are stored as, basically, a submodel and an offset for the sparks to emit from. When these sparks referred to a submodel that no longer existed... well, let's just say that despite the differences in these two errors (one throwing an exception and one triggering an assertion), they both boiled down to the same issue. At first, I thought I'd have to do some complicated trickery to find out which submodels were referenced by which subsystems, and then try to transfer the spark data... and then it occurred to me that any change in the model renders the offset coordinates more-or-less completely useless even if you could find equivalent subsystems, so I went with the far simpler solution of simply clearing all spark data.

That was not the end of it. No, not the end of it by far.

You see, the AI can also target subsystems. And that targeting is also handled by pointers, just like the player's targeting (in point of fact, the player's target is actually handled by an AI object, so they're exactly the same). So the code I'd written to handle the player's target would need to be generalized to handle every AI, or you get the exact same problem from non-player fighters. Right as I finish fixing that issue, I smack my forehead as I realize homing weapons also have a subsystem pointer, pointing towards the subsystem they're homing in on.

The list of things change_ship_type() needs to iterate over in order to fix out-of-date subsystem references keeps growing longer, and yet the mission doesn't stop crashing. "What's crashing this time? An array of the last subsystem each player targeted on each ship? Agh! Okay, let's fix that, too." Still crashing. "What. EMP missiles now? That doesn't even make sense! The AI's EMP target-scrambling doesn't target subsystems!" And yet, as intermittent as the crash was, it was still crashing. "Okay," said I, "give me the version of the test mission you're using and I'll do my best to make it crash."

Well, I did. And it did. And, to my considerable annoyance/perplexion... it was crashing due to an issue I thought had already been fixed.

Quote
<MageKing17> Okay, this is ****in' weird.
<MageKing17> I went through all that trouble to fix the AI targeting invalid subsystems in change_ship_type(), and then get a crash because... an AI is targeting an invalid subsystem!

It got weirder.

Quote
<MageKing17> I'm so confused.
<MageKing17> This ship has no goals to target a subsystem.
<MageKing17> How is it targeting a subsystem?

I scoured the code, looking for every reference I could find to ai_info.targeted_subsys being set; it pretty much only happened in set_targeted_subsys(), which was usually called with NULL (meaning "clear any subsystem target"). One of the very few places it was called with an actual subsystem was in ai_set_attack_subsystem(), which... is called in exactly one place, if the AI has a "destroy subsystem", "disable", or "disarm" goal.

Quote
<MageKing17> what the ****
<MageKing17> there should be no reason for a non-turret AI to ever be targeting a subsystem without having a goal to destroy a subsystem (or disable/disarm)

I was baffled. I was tearing my hear out (no, really; it hurt like hell). As far as I could tell, this scenario could not be happening, and yet it was clearly happening, or my debugger wouldn't be showing me this AI targeting a nonexistant turret.

But just then...

Quote
<MageKing17> oh wait

...I finally noticed...

Quote
<MageKing17> what's this function over here that looks like the exact problem

...that there was this one other place that called set_targeted_subsys() with a subsystem instead of a null pointer. Specifically, ai_big_strafe_maybe_attack_turret().

Quote
* MageKing17 facepalms

"What does ai_big_strafe_maybe_attack_turret() do?" you ask? Well, it's called when an AI either tries to evade a projectile, or gets hit by one while strafing a capital ship. It makes the AI direct its attention towards the turret that fired said projectile (depending somewhat on AI class), on the reasoning that taking out turrets that are shooting at it would help it to accomplish its other tasks without blowing up in the process. "How does it know which turret fired the projectile?" you might be wondering. Well, I'm glad you asked!

As it happens, it's handled by a pointer to the firing subsystem, stored on the projectile.

:banghead:

So, now we need to iterate not just through homing weapons, but every single weapon in the game looking for both homing subsystems and the turrets that (may have) fired them. However, with that fixed I couldn't get the test mission to crash any more, so hopefully (fingers crossed) this bug is finally fixed (although the PR is still waiting on code review as of this posting merged now).

The End? Oh please oh please be the end of this
« Last Edit: July 04, 2016, 07:38:16 pm by AdmiralRalwood »
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Holy ****

 

Offline Spoon

  • 212
  • ヾ(´︶`♡)ノ
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
So basically Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.
Codethulhu attempts to drain the sanity of yet another poor coder.

Interesting read, also very relatable.
Urutorahappī!!

[02:42] <@Axem> spoon somethings wrong
[02:42] <@Axem> critically wrong
[02:42] <@Axem> im happy with these missions now
[02:44] <@Axem> well
[02:44] <@Axem> with 2 of them

 

Offline MatthTheGeek

  • Captain Obvious
  • 212
  • Frenchie McFrenchface
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Spaghetti code debugging at its finest.
People are stupid, therefore anything popular is at best suspicious.

Mod management tools     -     Wiki stuff!     -     Help us help you

666maslo666: Releasing a finished product is not a good thing! It is a modern fad.

SpardaSon21: it seems like you exist in a permanent state of half-joking misanthropy

Axem: when you put it like that, i sound like an insane person

bigchunk1: it's not retarded it's american!
bigchunk1: ...

batwota: steele's maneuvering for the coup de gras
MatthTheGeek: you mispelled grâce
Awaesaar: grace
batwota: oh right :P
Darius: ah!
Darius: yes, i like that
MatthTheGeek: the way you just spelled it it means fat
Awaesaar: +accent I forgot how to keyboard
MatthTheGeek: or grease
Darius: the killing fat!
Axem: jabba does the coup de gras
MatthTheGeek: XD
Axem: bring me solo and a cookie

 

Offline General Battuta

  • Poe's Law In Action
  • 214
  • i wonder when my postcount will exceed my iq
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
It's funny how when a thing breaks, and you check all the possible ways it could've broken, and then figure out a few more ways it could've broken, it's always the thing you didn't quite check.

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
It's funny how when a thing breaks, and you check all the possible ways it could've broken, and then figure out a few more ways it could've broken, it's always the thing you didn't quite check.
I tried so hard to search the code for any non-temporary subsystem pointers, too; there's just so much code to sort through that it's all too easy to miss something vitally important.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 

Offline Mongoose

  • Rikki-Tikki-Tavi
  • Global Moderator
  • 212
  • This brain for rent.
    • Minecraft
    • Steam
    • Something
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
I feel bad for loving these sorts of stories as much as I do, because I know there must be a pile of pulled-out hair sitting next to the keyboard associated with them.

 

Offline 666maslo666

  • 28
  • Artificial Neural Network
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
At least it crashed. I think the worst bugs are logic bugs where debugger shows nothing but there is something subtly wrong. I am just an amateur gamedev and one of those bad boys made me contemplate giving up on my hobby and go collect stamps instead: all my meshes were rotating correctly except for simple quads, which were rotating, but not always as they should. After lots of trying, I gave up on that bug as unfixable and kinda worked around it if needed. Months later, I randomly noticed a possible bug in model loading code, and after fixing it, my quads were suddenly rotating as they should. Turns out the quad model geometry (and only the quad model geometry) failed to load. But the model was still valid and could be rendered as a quad (two triangles, six vertices), except that it took those vertices from the model in the buffer that was next in line. Which just happened to be a cube. And when you render only six vertices of a cube, you get a quad. A quad with offset model center, but a still quad nonetheless. So all this time, I was using top side of a cube as a quad without knowing it..

And then I realize that FS codebase is 100x larger than my amateur engine and written by many people..  :eek2:
« Last Edit: July 04, 2016, 01:00:54 pm by 666maslo666 »
"For once you have tasted flight you will walk the earth with your eyes turned skywards, for there you have been and there you will long to return." - Leonardo da Vinci

Arguing on the internet is like running in the Special Olympics. Even if you win you are still retarded.

 

Offline DefCynodont119

  • 210
  • Ascended GTSC-Faustus Artist
    • Steam
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
This, This Is the best post ever, Thank you.

Just To illustrate how elusive the ai_big_strafe_maybe_attack_turret() crash was:

After 3+ hours of the testing over and over, I was only able to trigger two errors. One was an unexplained CTD, and the other was a bizarre case where: 1, after 10-15 seconds of doing the thing that used to make it crash the sound started cutting in and out, 2, then the weapon/engine effects flickered then disappeared, 3, the whole screen just went black. this happened over the course of about 1.5 minuets.   :shaking: unfortunately i was not in Debug when that happened.


However; most of my efforts to wreck the game caused nothing.
I can no longer reproduce the crash with more then 7ish% chance of it happening as far as I can tell.

That was me giving up on just reproducing the error when AdmiralRalwood pushed on, venturing deep into the temple of code, risking sanity and injury to save the world from entire legions of crashes that outwardly look the same.

« Last Edit: July 04, 2016, 03:56:08 pm by DefCynodont119 »
My gift from Freespace to Cities Skylines:  http://steamcommunity.com/sharedfiles/filedetails/?id=639891299

 

Offline niffiwan

  • 211
  • Eluder Class
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Thank you for a very entertaining read sir

* niffiwan tips his hat
Creating a fs2_open.log | Red Alert Bug = Hex Edit | MediaVPs 2014: Bigger HUD gauges | 32bit libs for 64bit Ubuntu
----
Debian Packages (testing/unstable): Freespace2 | wxLauncher
----
m|m: I think I'm suffering from Stockholm syndrome. Bmpman is starting to make sense and it's actually written reasonably well...

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
That was me giving up on just reproducing the error when AdmiralRalwood pushed on, venturing deep into the temple of code, risking sanity and injury to save the world from entire legions of crashes that outwardly look the same.
entire legions of crashes
legion
I see what you did there.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 

Offline Goober5000

  • HLP Loremaster
  • Moderator
  • 214
    • Goober5000 Productions
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
Awesome read.  Awesome bugfix.

AdmiralRalwood, can I commission you to write the story of my subsystem-look-at refactor?

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
AdmiralRalwood, can I commission you to write the story of my subsystem-look-at refactor?
I'm flattered, but I doubt I could do it justice when I wasn't the one working on it... which, unfortunately, I don't understand the vector math either.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
This wouldn't have happened if you'd used Rust!!!
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline niffiwan

  • 211
  • Eluder Class
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
you mean that FSO wouldn't have happened? :p
Creating a fs2_open.log | Red Alert Bug = Hex Edit | MediaVPs 2014: Bigger HUD gauges | 32bit libs for 64bit Ubuntu
----
Debian Packages (testing/unstable): Freespace2 | wxLauncher
----
m|m: I think I'm suffering from Stockholm syndrome. Bmpman is starting to make sense and it's actually written reasonably well...

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: AdmiralRalwood and the Chamber of Secr-- er, the SEXP of Doom
First nightly build with this fix (as well as APNG support) is here.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.