Hard Light Productions Forums

Announcements => Announcements => Topic started by: Zacam on November 19, 2012, 04:15:55 am

Title: Unplanned Outage and Maintenance
Post by: Zacam on November 19, 2012, 04:15:55 am
Well, with all the fun that has been taking place around here what with script kiddies getting rambunctious and the like, we didn't happen to realize a few things.

Like the fact that the main machine was running out of drive space. And that some of the log files, not rotating properly, were getting MUCH larger than they should have.
Add to that, at some point, the system restarted or the services for it timed out in some fashion. One of those services being MySQL and since it ended unclean on a full disk, the forum Database decided it was corrupted.

And so, by the exhaustive efforts of rev_posix specifically and myself, we've managed to get all the drive space reclaimed and the database -theINory- restored.
If you find that a post that you made at some point can no longer be found, or that an attachment no longer downloads, feel free to let me know this, but be advised ahead of time that there might not be anything that can be done about it.

I'd like to thank everybody for their patience and apologise for the amount of unexpected down time. What made this even hairier for us is that our cable service provided decided to have an outage of their own while we were in the midst of getting things operational. In short, a comedy of Murphian proportions took place.
Title: Re: Unplanned Outage and Maintenance
Post by: headdie on November 19, 2012, 04:34:58 am
**** you guys have had a rough time lately.  many thanks for your efforts getting this place back up and running :D
Title: Re: Unplanned Outage and Maintenance
Post by: Rodo on November 19, 2012, 07:58:15 am
I felt naked there for a few hours.

Thanks for restoring the forums, you make me happy :D
Title: Re: Unplanned Outage and Maintenance
Post by: Crybertrance on November 19, 2012, 12:42:12 pm
I felt naked there for a few hours.

Thanks for restoring the forums, you make me happy :D

You weren't the only one...
Title: Re: Unplanned Outage and Maintenance
Post by: FireSpawn on November 19, 2012, 05:18:34 pm
All hail our benevolent HLP overlords!
Title: Re: Unplanned Outage and Maintenance
Post by: rev_posix on November 19, 2012, 11:25:51 pm
I'm just glad that the DB dump of the forums was pretty current and happened before the corruption took effect.  That would have sucked as the next current backup was about a week old
Title: Re: Unplanned Outage and Maintenance
Post by: Mongoose on November 29, 2012, 08:24:16 pm
A few people seem to have stumbled across a couple of longer posts that have been truncated...the weird thing is that these posts were made a good year or two ago, well before any of the actual database issues.  Lorric's post here (http://www.hard-light.net/forums/index.php?topic=79476.msg1572967#msg1572967) is one of them, and the other is Battuta's interview with Jason Scott here (http://www.hard-light.net/forums/index.php?topic=74745.msg1477326#msg1477326).  (Seriously, of all the posts to get hit...)  Fortunately in both cases we have backups of the original posts, so it's not exactly urgent, but I figured there might be something useful to figure out on the back-end.
Title: Re: Unplanned Outage and Maintenance
Post by: karajorma on November 29, 2012, 08:30:10 pm
The Diaspora release thread also was truncated. Again I had a backup.
Title: Re: Unplanned Outage and Maintenance
Post by: Zacam on November 29, 2012, 11:03:03 pm
Database corruption doesn't care when the post was made as far as what it will munch. And usually, the longest possible entries will be the most likely hit, regardless of when they were made.

That's why it's called corruption.
Title: Re: Unplanned Outage and Maintenance
Post by: Mongoose on November 29, 2012, 11:47:03 pm
True, but rev's post made me think that you guys were restoring stuff from a backup before the corruption, which is why I was puzzled.
Title: Re: Unplanned Outage and Maintenance
Post by: rev_posix on November 30, 2012, 12:15:08 am
I did.  However, something might have gone weird during the conversion from SMF 2.0RC to the current, which this sounds like to me, especially if the posts are all chopped at about the same length.

If you have backups, kewl.  If not, anyone with shell could ssh in and pull the missing data from the backup I used to recreate the backend.
Title: Re: Unplanned Outage and Maintenance
Post by: Goober5000 on November 30, 2012, 12:20:42 am
I did.  However, something might have gone weird during the conversion from SMF 2.0RC to the current, which this sounds like to me, especially if the posts are all chopped at about the same length.
Didn't you read that email I sent to you and Zacam? :p  I theorized that the database entries could have been truncated at an apostrophe, and so far, every truncated post I've seen has fit that pattern.  And there are quite a lot of truncated posts -- including some on hosted internals with fairly important project information on them.

Quote
If you have backups, kewl.  If not, anyone with shell could ssh in and pull the missing data from the backup I used to recreate the backend.
The problem is that this requires someone to notice that there are posts with missing data.  And the posts which have been truncated don't appear to fit any pattern, other than that they may have been truncated at apostrophes.  Couldn't we take the forums offline for a few hours and just run a comparison of all posts on the forum against all posts in the backup?
Title: Re: Unplanned Outage and Maintenance
Post by: mjn.mixael on November 30, 2012, 12:29:11 am
BAH! I've lot at least two big posts... one of them a pretty important internal BtA post, the other a tutorial on the Render Boutique. Sadly, I don't have backups. :(
Title: Re: Unplanned Outage and Maintenance
Post by: rev_posix on November 30, 2012, 12:32:29 am
The problem is that this requires someone to notice that there are posts with missing data.  And the posts which have been truncated don't appear to fit any pattern, other than that they may have been truncated at apostrophes.  Couldn't we take the forums offline for a few hours and just run a comparison of all posts on the forum against all posts in the backup?
In theory, yes.

The simplest way I could think of to do this would be disallow all access to the forums (probably shutting down apache to be sure), dumping all 2.2 gigs of raw SQL, performing a diff on the backup and the new dump (they are just text files), finding the differences between the two, and then looking to see if they are posts or something else, like a new timestamp, new post, new user, etc.

You volunteering to do so?  :P
Title: Re: Unplanned Outage and Maintenance
Post by: Goober5000 on November 30, 2012, 12:41:13 am
NO U. :p

Actually, what I was thinking was loading all 2.2 gigs of raw SQL into a parallel table, then writing a custom program to iterate through every post (and private message) and just compare the lengths.  (Or, if there are character encoding problems, compare the number of alphanumeric characters, while ignoring the special characters.)  But that's the programmer in me thinking, not the sysadmin.

Whatever way we do it, though, it needs to get done.  There are just too many truncated posts, and undoubtedly we haven't found them all.
Title: Re: Unplanned Outage and Maintenance
Post by: Zacam on November 30, 2012, 02:00:57 am
Goober: NO U, now shuddup and listen.

1: You can't do a diff because there are conversion changes between SMF 2.0.2 DB and RC5. Fail. And across 2.2 gigs? No, throwing twin DB's into a mastered array and doing a sync would be faster but still computationally expensive as all sin.
1A: Also, the above is only really feasible in Sybase. MySQL, wonderful as it is, just doesn't have what it would take to do something like this elegantly OR swiftly.

2: Easier method: Since these are all mostly older posts: Mount a pre-collapse DB on an iteration of the forum software with a unique URL, then between two tabs, go to the post ID and copy from one to the other.

Not only is number two easier AND guaranteed to get correctly formatted into the newer DB structure, it's also far less hassle to accomplish. All it requires is for meticulous individuals to report truncated post message #'s as they are found on the forum. Or directly link them. Then modify the link in a second tab/browser/whatever to the alternate URL forum and make with the CTRL+C and CTRL+V like a boss.

Granted, it doesn't catch across all the possible instance for a "full clean" version, but given what it'll take to achieve option 1, it the more significantly productive "right now" result to run with.

Check the Admin internal for more information.

For everybody else: Check this out and report away; http://www.hard-light.net/forums/index.php?topic=83008.0
Title: Re: Unplanned Outage and Maintenance
Post by: jr2 on December 01, 2012, 04:29:53 pm
How about 2A; Do 2), then convert from SMF 2.0.2 to RC5, to get rid of the conversion changes; then do a db compare-whatever-you-call it? 

Granted, I have no clue as to if that would be feasible, but it seems to me to be logically possible.
Title: Re: Unplanned Outage and Maintenance
Post by: Zacam on December 01, 2012, 06:06:24 pm
You can't reverse 'upgrade' like that and there is no guarantee that the results (even if you could) wouldn't end up being for the worse.