Off-Topic Discussion > The Classics

Epic Bughunt Claims Sanity, Human Blood

(1/15) > >>

General Battuta:
About fifty hours ago we were in the process of wrapping up another mission. This was a pretty FRED-intensive mission using some relatively complex stuff: variables, set-object-positions using relative coordinates, skybox changes and the like.

I was pretty pleased with the outcome. We'd iterated this mission a lot, and this was the third revision of the concept. Finally, after days of work stretched across months of real time, it felt like it had worked out pretty well.

I made a few changes and ran through the mission for the Nth time to make sure everything was working.

About halfway through the mission, right when a critical event fired, the world was devoured. The ships, the skybox, even the player's own vessel all vanished, and the screen was filled with inky blackness. All that was left was the HUD and the sun, and even those weren't working right: you could still target objects, but they were (like everything else) invisible, and the distance and speed monitors were both blank. Even the ambient hum of the engines was gone. The 3D radar, meanwhile, had migrated up into the spot where head .anis are usually displayed.

The game's rendering engine seemed like it had just quit. Given up the ghost. All that was left were the 2D elements. It was as if Ransom's Transcendent had decided to pay a visit and eat reality.

I stared at this in bemusement and horror, posted a brief note on IRC, and restarted the mission.

Same thing. Same point in the mission.

I looked at the event in question. There was a lot of stuff going on right then: skybox changes, coordinate changes, coordinates pulled from variables, engine subsystem repairs, play-sound-from-files, fadeouts. It was a nightmare. Any one of those elements (or a combination thereof) could hypothetically be pushing the engine too far.

But it wasn't worth panicking until it could be reproduced. I posted the mission on SVN and, sure enough, The_E and others got the same result too. But they didn't get it every time. I kept changing things in the mission and testing, and sometimes I'd see it work perfectly, and I'd commit it with a happy note like IT'S FIXED...and then The_E would try the 'fixed' version and hit the same bug.

Now, for a bit of background. We'd met this bug before. Previously, we'd always encountered it in relation to a bad asset: first the Vishnan Keeper, then the Anemoi. In both cases we figured we'd solved it by cleaning up the .pof.

Yet none of the assets in this mission were bad, so far as we could tell. For some reason, the mission was eating itself at this point.

First we tried debug logs. They were useless. That must have eaten up an hour or so.

Next, The_E built a debug filter to get additional information. While he was doing that, I started systematically pulling newer SEXPS away from the event to see what could be causing the quit. Surely the coordinate manipulation or skybox change could explain it. Or perhaps it was the every-time conditional used in a linked event; every-time is notoriously hazardous.

Nothing helped. I pared the event down to two SEXPs: an engine subsystem repair and a hud re-enable. Since the engine repair was a retail SEXP, and the hud problem wasn't causing it (I could move it around with no effect), we were right back where we started.

Now bear in mind I'd been working on the third iteration of this mission for a great many hours already. I was sick, I was exhausted, and somehow I'd managed to break a glass during the trial-and-error debugging process, cut myself, and start bleeding everywhere.

The_E reported in. He'd been backtracking a stream of errors, starting in the sound code and moving all the way back to what appeared to be the original cause. He figured out the cause of the problem, but not the trigger. Somehow the engine was losing track of the player's position, converting a valid vector into NaN (not a number.) There was no error checking here, so as I understand it, the error propagated through the whole engine, smashing things apart like a Shivan hitchhiker on a Thoth.

But he couldn't figure out what caused the NaN error in the first place. We knew it always happened at the time this event fired, but now that it had been stripped down to fundamentally 'safe' levels, how could it cause the problem?

Blinded by RAGE at this bizarre bug crippling my brilliant mission, I started swapping out all the custom assets for MediaVPs ones.

And the bug went away.

I boggled. We had been certain it wasn't caused by bad assets this time. But every time I tried the mission (time-compressing through to the point of the bug) with the Perseus swapped in for the optimized Uhlan, the bug didn't seem to happen.

Except for The_E. He was able to reproduce it with the HTL Perseus. Which meant it couldn't be the Uhlan either.

At this point we were all taking this bug very personally. The_E started doing something very technical which he could probably explain better, but which seemed to involve running the engine forward frame-by-frame, looking for the moment when everything went horribly wrong in the player eyepoint vector. This was an insanely laborious process that, to the best of my knowledge, ate up hours of his time.

Take a moment to think about the timeline here. By this point, more than 24 hours had passed since we'd first posted about this bug on an SCP internal forum and Mantised a test mission reproducing it. Nobody had a bloody clue what was going on.

Finally, almost 48 hours (if I remember right) after first encountering it, I did something nonsensical but ultimately productive: I tried deleting the repair-engine SEXP from the event. This was a retail SEXP, so it should have been safe.

The bug went away. The mission kept on happily rendering. There was no inky void of all-consuming night. Yet there was no way that SEXP could have caused it, because sometimes - maddeningly - the mission worked fine even with it in place.

Truly this was the bug from hell.

Mere minutes later, working independently, IssMneur on #SCP struck gold. "Hey," he said (I paraphrase here.) "Have you been matching speed with something during the mission?"

We all ran through the mission, matching speed with a certain ship. The rendering engine quit when the engine-repair SEXP fired.

We ran through again without matching speed. The mission worked fine.

Matching speed was the source of the problem. When I'd swapped in the Perseus for the Uhlan earlier, I'd simply time-compressed to the failure point without bothering to 'play' the mission, and thus never matched speed. That's why the bug had gone away there.

After some frenzied diagnosis, IssMneur and The_E explained what had happened. Apparently, if the player was matching speed with a target ship at the time that they were disabled, but then later repaired, the game engine would divide zero by zero at the moment of the repair and produce a nonsensical result which worked its way into the player's position. The rendering engine would fumble, choke, and stop working.

As far as I know this could have happened in retail.

They built a patch for the issue and promptly generated a new build that fixed it. I cleaned the blood spatter off my keyboard.

Then the mission started crashing for a totally different reason. But that's a story for another time, and now it's fixed too.

The_E may be able to provide more insight into the hilariously random and difficult-to-reproduce technical side of the bug. But that's my part of the story.

Droid803:
The lesson:

Don't use match speed.
Use the Z key like a real man! :P

(I know I'm missing the point)

Spoon:
Oh wow, just wow
Hilarious but at the same time I can completely understand the frustration this must have caused.

I encountered this black void of doom only once my self. I had just build a new ship and was going to test it out ingame, the ship warped in... and then vanished. Then I moved my own ship forward and everything went into a blackhole. That scared the living **** out of me but then it never showed up again.

Sushi:
The real lesson:

Don't assume that retail stuff works right. :)


Also, major kudos to those of you who fought and defeated this behemoth.

MatthTheGeek:
I remember I also managed to reach the black void of doom while playing around with Fury's AI, when I made a type which assigned a ship to an AI class that didn't exist :/

Ah, the delights of modding... :D

Navigation

[0] Message Index

[#] Next page

Go to full version