Author Topic: Alternative to VP-files - compressed archives (read before freaking out)  (Read 33944 times)

0 Members and 1 Guest are viewing this topic.

Offline The E

  • He's Ebeneezer Goode
  • Global Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Alternative to VP-files - compressed archives (read before freaking out)
Not so much reduce download size, but to reduce install size.
Let there be light
Let there be moon
Let there be stars and let there be you
Let there be monsters and let there be pain
Let us begin to feel again
--Devin Townsend, Genesis

 

Offline CaptJosh

  • 210
Re: Alternative to VP-files - compressed archives (read before freaking out)
That too. Thanks for catching that I forgot to mention that.
CaptJosh

There are only 10 kinds of people in the world;
those who understand binary and those who don't.

 

Offline WMCoolmon

  • Purveyor of space crack
  • 213
Re: Alternative to VP-files - compressed archives (read before freaking out)
Taylor posted the spec for his idea for a CVP format on the private forum 4 years ago, but apparently it never got implemented. Here's the conversation:

I can't say that this isn't going to change a little bit before it actually hits CVS, but having played around with it for months last year, I think it's pretty solid.  It's largely the same as the current VP stuff, I wanted to keep it as close as possible to simplify the code.

(All values are little-endian.)

Header:
Sint8[4]  --  header id, "CPVP" for a CVP
Sint32    --  version #, initial version number for CVP is 1, and it needs to respect this (unlike the current standard VPs)
Sint32    --  offset to the file index (at end of VP/CVP)
Sint32    --  total uncompressed size of archive
Sint32    --  number of files in the archive
Uint32    --  CRC data (NOTE: this isn't a standard CRC32 value, FS2 has a code bug that we can't fix)

File Index (for each directory/file):
Sint32     --  archive offset of current file (NOTE: this is the real offset, to compressed data, not to uncompressed data)
Sint32     --  uncompressed size of the current file
Sint32     --  compressed size of the current file
Sint8[32]  --  file/directory name
Sint32     --  date/time of last file modification


Each individual file is compressed separately, NOT the entire archive.  The file index is also compressed, using level 9 compression.  Individual files can use any compression level so long as it's not above 6 (lower levels == faster, but less, compression).  Level 6 is recommended as the standard value though as it offers the best compromise between speed, size and memory usage for decompression.  The following file types should NOT be compressed:  WAV, OGG, MVE.  The reason is that those files are often streamed and streaming compressed files is extrememly slow (so much so that I'm not even going to code in support to do it).  Also, OGG files don't compress worth a crap anyway.

The only thing guaranteed to be compressed in a CVP is the file index.  The exclusion types listed above should never compressed.  Beyond that it should be user choice whether to actually compress any other data or not.

Thanks. :)

Sint32    --  total uncompressed size of archive

I assume this takes into account the finished size of the archive? (eg not just the files that make it up, but including the header and index as well) Seems like a dumb question but I want to be totally sure we mean the same thing by archive.


Uint32    --  CRC data (NOTE: this isn't a standard CRC32 value, FS2 has a code bug that we can't fix)

Dare I ask further? Since this is a new format I don't see why a standard value isn't possible, even if fs2_open has messed up functions right now. I guess if it's being used for network stuff in the future...but that would necessitate a change to the network code, anyway.


Sint32     --  uncompressed size of the current file
Sint32     --  compressed size of the current file

Does uncompressed==compressed for all noncompressed files? (I'm looking for a better/more reliable way to determine compression status than file extensions).

I assume this takes into account the finished size of the archive? (eg not just the files that make it up, but including the header and index as well) Seems like a dumb question but I want to be totally sure we mean the same thing by archive.
No, it's just the data.  Header offset and index size are not included.  It's more for informational purposes than a function of use for the game.  The primary purpose for this is for utilities, not the game itself, so that you can quickly tell how much HD space would be required if you wanted to extract the entire CVP.  It's generally computed by the archiver anyway, for basic info on overall compression success, and simply storing it in the CVP header makes things easier for extraction than manually computing the total size.

Quote
Dare I ask further? Since this is a new format I don't see why a standard value isn't possible, even if fs2_open has messed up functions right now. I guess if it's being used for network stuff in the future...but that would necessitate a change to the network code, anyway.
It's used by the current/retail networking code, so if we change it then it breaks compatibility will all previous versions.  It's not really a big deal though and you can quite easily code up your own CRC code which matches what the game uses now (it's basically just missing a XOR on the crc value).  Though, now that I think about it, I'm not 100% sure that the CRC table is computed properly either.  But, again, it's just minor stuff overall since there are multiple incarnations of CRC32 anyway.

Quote
Does uncompressed==compressed for all noncompressed files? (I'm looking for a better/more reliable way to determine compression status than file extensions).
Yes.  When writing an uncompressed file you do it just like with a normal VP, and only go through the bzip2 calls for compressed content.



Oh, for anyone else that is interested, here are the docs for the bzip2 API: http://www.bzip.org/1.0.3/html/index.html

Actually, one thing that I think might come in handy would be if the signed integers for position (both TOC and file data position) were changed to unsigned integers or, better yet, 64-bit integers. This would effectively eliminate a possible future problem of VPs growing too big for signed integer positions. The BTRL Demo release is, IIRC, 650-some MB, so technology is starting to get to the point where breaking the 2GB limit is a possibility.

I was actually going to originally move the 64-bit for the size and date issues, but then I decided that I just don't care.  OS support wise, it's not really worth the hassle to me.  Compression will help keep it under 2gig a bit longer, and it's not like it's that hard to just split a large VP into two smaller ones.  I just think that the coding effort required to do 64-bit size support properly greatly outweighs the general ability to avoid requiring it in the first place.

Oh, and it might be this weekend before I get you that test CVP.  I got busy doing the BtRL installers, and keeping the server running through the digg hit, and just haven't had a chance to get anything ready for you yet.  I'll try and take care of it Friday though, if possible.

My next VP tool release will support 64-bit stuff internally (it will of course check before it sticks it in the C/VP) so it shouldn't be too much of an issue when you do change it. I've just been eyeing that limit warily since I started work on the reading/writing backend. ;)

Crap.  I totally forgot about this.  :rolleyes:

I should have time to send you the file tomorrow, assuming that I actually get normal mapping 100% by then (and if I haven't killed myself in the process  :mad:).  I don't think it would be any easier for me to just see the output from you writer though, since I still haven't updated my VP extrator code for the final CVP spec, only the archiver.
-C

 

Offline The E

  • He's Ebeneezer Goode
  • Global Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Alternative to VP-files - compressed archives (read before freaking out)
Have you read the thread? Unlike you or taylor, we are not comfortable with the idea of using a proprietary archive format.
Let there be light
Let there be moon
Let there be stars and let there be you
Let there be monsters and let there be pain
Let us begin to feel again
--Devin Townsend, Genesis

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: Alternative to VP-files - compressed archives (read before freaking out)
Yeah I'd imagine that was probably one of the reasons it was never implemented - no one had the time to sit down and write/maintain the code for the new format.  Of course, no one's sitting down to get LZMA added in yet either, but maybe we can work on that shortly after 3.6.14.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Alternative to VP-files - compressed archives (read before freaking out)
7zip is open source.  If you had to add something, you could modify .7z behavior.  It just wouldn't open with 7zip.  Unless you added support with a plugin.

 

Offline The E

  • He's Ebeneezer Goode
  • Global Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Alternative to VP-files - compressed archives (read before freaking out)
7zip is open source.  If you had to add something, you could modify .7z behavior.  It just wouldn't open with 7zip.  Unless you added support with a plugin.

Read the thread. Using an archive format that is not editable via existing tools (be they the various VP editors, or dedicated archivers like 7zip) is not a good idea.
Let there be light
Let there be moon
Let there be stars and let there be you
Let there be monsters and let there be pain
Let us begin to feel again
--Devin Townsend, Genesis

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Alternative to VP-files - compressed archives (read before freaking out)
How many times?  I was merely pointing out the possibility if you really had to add some functionality.  What you coders do is up to you.  :P

 

Offline The E

  • He's Ebeneezer Goode
  • Global Moderator
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Alternative to VP-files - compressed archives (read before freaking out)
Well, reading the thread should have given you an idea of what we are thinking of in terms of archive support, and that your suggestion was already discarded a few pages back.
Let there be light
Let there be moon
Let there be stars and let there be you
Let there be monsters and let there be pain
Let us begin to feel again
--Devin Townsend, Genesis

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: Alternative to VP-files - compressed archives (read before freaking out)
I just edited the OP, to hopefully cut down on discussion that's already been...discussed.  If there's nothing else to add, we could probably just lock this too, and get to work on implementing the LZMA support.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Alternative to VP-files - compressed archives (read before freaking out)
Well, reading the thread should have given you an idea of what we are thinking of in terms of archive support, and that your suggestion was already discarded a few pages back.

I realized that.  However I also realized that taylor knows things I don't as a non-coder.  As do you.  So I figured I'd chip in my (pretty worthless) two cents, in the off-chance that it would give someone who knows more than me an idea.

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: Alternative to VP-files - compressed archives (read before freaking out)
Taylor's posts contain a few good points that haven't been addressed yet.  For example, he recommends that each file be compressed individually, instead of the whole thing as a solid block, because certain file types don't compress well (often because they're already compressed).  Another interesting point is that he says streaming compressed data is very slow, which would be a significant problem.

 

Offline Iss Mneur

  • 210
  • TODO:
Re: Alternative to VP-files - compressed archives (read before freaking out)
Taylor's posts contain a few good points that haven't been addressed yet.  For example, he recommends that each file be compressed individually, instead of the whole thing as a solid block, because certain file types don't compress well (often because they're already compressed).
Which is what a non-solid 7z archive is...

Another interesting point is that he says streaming compressed data is very slow, which would be a significant problem.
So don't compress the archive that you put the streaming media in (.7z or just a .vp).
"I love deadlines. I like the whooshing sound they make as they fly by." -Douglas Adams
wxLauncher 0.9.4 public beta (now with no config file editing for FRED) | wxLauncher 2.0 Request for Comments

 

Offline WMCoolmon

  • Purveyor of space crack
  • 213
Re: Alternative to VP-files - compressed archives (read before freaking out)
Have you read the thread? Unlike you or taylor, we are not comfortable with the idea of using a proprietary archive format.

Yes, I did.

Having spent a few years watching taylor seem to do as much coding as the rest of the team combined I'm willing to believe that his design would have been well-designed and well-thought-out with regards to its application to the FS2Open engine. I see the beginnings of an idea here, but I don't see any kind of in-depth plan. I don't see anybody addressing his comment that the FS2Open CRCs are different than normal CRCs and you'd have to break the networking code to fix it, or addressing what level of compression is adequate, or whether third-party utilities would only compress the files that we want compressed. In fact, I see very little discussion or understanding of how this will work in the FS2Open engine. Will decompression be streamed for certain filetypes and not others? Which filetypes? Will the new pack files be read with the same precedence as old files? Will loading a 7z file be internally compatible with the structures that FS2Open uses to store package filesystem data?

Maybe those aren't issues now. Maybe you've rewritten the network code and the packfile code so it bears no resemblance to the old stuff and the time taylor spent researching this stuff is completely irrelevant now. I don't pretend to know.

A lot of these are issues that using a custom archive based on VPs can mitigate, but a standard archive format does raise. Sure, it's easier for users, and that's great (except where it's been pointed out that people might take that as license to edit them like archives).

I've spent probably a half-hour now reading this thread, digging up that post, and now tonight, explaining exactly how I can see it being useful. This didn't need to be a confrontation, I was trying to be helpful because I figured most people had probably not known or forgotten about that thread since it was so long ago and nobody had brought it up.

I didn't even state that I disagreed, and you're already arguing with me.

Yeah I'd imagine that was probably one of the reasons it was never implemented - no one had the time to sit down and write/maintain the code for the new format.  Of course, no one's sitting down to get LZMA added in yet either, but maybe we can work on that shortly after 3.6.14.

This is the second needless assumption that pisses me off. Not only did I end up restructuring Maja in order to support compressed files, but I actually have a CVP handler in there with an 'import' function. An 'export' function is no big deal. I was considering doing that to ease the transition if that was what this discussion came to, but now I'm afraid to admit even thinking about volunteering for fear that you'll take it as further evidence that I think you're wrong and I'm trying to subvert your plans and fight you.

Trust me, if I seriously thought you should implement a proprietary format instead of 7-zip and I wanted to fight you on it, you would be looking at pages of writing explaining exactly how I thought your implementation was flawed, exactly why you shouldn't do it, how I could do better, and the plans I was already drawing up for my implementation. As it is, all I've done is repost (and now point out) a set of problems that somebody else spent time finding out, that now you (or whoever goes to do the implementation) doesn't have to spend time relearning or finding out when you make the change and the network code mysteriously quits working or something.

And where on earth did anybody get the impression that taylor endorses this or I even think taylor endorses this? The post was four years ago. A lot has changed since then in the computing world. I have no clue what taylor thinks and I am totally uncomfortable getting him involved in this now, for fear that somebody will think he has any opinion on this.

TL,DR: Don't assume that people are fighting you and accuse them of doing no work on the subject when they're spending free time trying to help you.
-C

 

Offline Zacam

  • Magnificent Bastard
  • Administrator
  • 211
  • I go Sledge-O-Matic on Spammers
    • Minecraft
    • Steam
    • Twitter
    • ModDB Feature
Re: Alternative to VP-files - compressed archives (read before freaking out)

Maybe it's assumed that we already know that the FSO CRC's don't match other "real world" crc generation. I've even mentioned attempting to provide a way for FSO to actually generate/use CRC values like you would see any archiving program provide so that when we post the data from an archival tool it can then match what the game generates in the Debug log.

However, not only does the math currently escape me, I did also realize the complexity of the networking issue and it would have to be addressed at a point by saying "This point forward will not be compatible with prior versions" which we have already done a time or twelve, just not quite as significantly.

Further, in the discussions from 4 years ago, I'm curious to know why the 32 char file name limitation would have been proposed as kept. It should have been recognized as even then being utter ridiculous to keep, seeing as how Retail wouldn't be able to use said new format anyway.
(And even if nobody took into the consideration the increasing expansion of file names for such as adding "-normal" to them, it's still a silly notion to keep as there are hardly any filesystems that have that limitation themselves and any that do, well, I don't honestly know what to say to that scenario other than "That sucks, get something better")

Good to hear that you forward thought Maja enough to include the basic implementation idea. I love that app and use it regularly and exclusively. But now, how many people actually knew you did all that? Did you share with the team? Was there any indication that -anything- other than discussion took place? If you did, then that was then and this is now, and if not then who knew? So what is the point of being upset by it? Much less calling it a "Needless assumption".

As for addressing "what level of compression is adequate" I seem to remember reading in the first few posts a benchmarking indicating how each compression level worked out. And in various different compression types, hence why non-solid LZMA was selected and it can be up to the modders as to what level of compression they want with an eye towards the medium option as the one with the most flexibility.

And people are already taking license to edit VP's. We can either frustrate people by making it even more difficult to do so and require more "l33t" know how or we can make it easier to access and thus also easier to address/fix/troubleshoot and in general point out when they are being an idiot.

As for endorsements, I think it was just in general held as "there was an idea once, but so far nothing has happened on it. Maybe we can look at taking a stab at it ourselves". Doesn't in any way make the previous discussion obsolete in any way other than to potentially call for refactoring some of the implementation ideas. It's more about the endorsement of the Concept, not the implementation and that is only derived from the fact that the "Idea" happened and had enough behind it that a fairly well rounded technical conversation took place regarding it. If that's not an endorsement of some kind, I have no idea what is then. And again, it's on the basis of a concept or idea, not necessarily to a specific method or implementation. And it's also in reference to the past tense. We can say that either of you endorsed the idea as a concept then. That does not automagically translate into endorsing the concept/idea/implementation as it is today.

I have to admit that I'm at a total loss of understanding where or how the discussion in here has managed to take on any sort of hostile overtones or perceptions there of. Thank you for finding and sharing the information, some of the non-internal access people may find it valuable and useful. Can we all just get along now?
Report MediaVP issues, now on the MediaVP Mantis! Read all about it Here!
Talk with the FSU on #SCP-FSU Talk with the SCP on #SCP
"If you can keep a level head in all this confusion, you just don't understand the situation"

¤[D+¬>

[08/01 16:53:11] <sigtau> EveningTea: I have decided that I am a 32-bit registerkin.  Pronouns are eax, ebx, ecx, edx.
[08/01 16:53:31] <EveningTea> dhauidahh
[08/01 16:53:32] <EveningTea> sak
[08/01 16:53:40] * EveningTea froths at the mouth
[08/01 16:53:40] <sigtau> i broke him, boys

 

Offline Goober5000

  • HLP Loremaster
  • Administrator
  • 214
    • Goober5000 Productions
Re: Alternative to VP-files - compressed archives (read before freaking out)
I think WMC still has some battle scars from back when he and I used to argue about SCP features.  He and I had quite a few disagreements, and he has made a couple of commits that have caused me to nearly pull my hair out (the eight-stage Transcend warpout bug being one), but he is an excellent coder and he does have a mind for design.  The camera system and the scripting system both involved a lot of planning and long-range thinking, and those are things we really need to keep in mind when discussing a core feature like CVPs.  So WMC can offer a lot of insight on this topic, especially from his experience on Maja, which is about the closest anyone's ever come to working on this feature.

We should all take care not to be overly confrontational in our discussion (which I have had a tendency to do, and I think Zacam and The E do also).  As long as we remember to debate the topic and not the person, I think we can hammer out some good points in this thread. :yes:

 

Offline Tomo

  • 28
Re: Alternative to VP-files - compressed archives (read before freaking out)
I suspect that the main reason Taylor was considering 'home-grown' compressed archives was simply down to a lack of open-source ones at the time.

There is also a tendency within any team to ignore externally-developed useful code in favour of writing it themselves. This "Not Invented Here" attitude is clearly seen to be foolish once you manage to step back, but that kind of perspective tends to be hard to maintain.

Given that the 7z libraries are freely available now, there really is no point in building a home-grown version - linking in the 7z library hands us a lot of well-tested stuff for free.

- Plus, if done right, it means that we can trivially add other packaging systems at a later date, should newer open-source BSD/L-GPL or other appropriately licensed stuff become available.
There's quite a lot of other stuff that wants doing to cfile.cpp anyway, to remove the requirement of short pathnames and write access to the installation folder.

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: Alternative to VP-files - compressed archives (read before freaking out)
Heh, I started looking into some of the public formats used by other games just now.  .pk3 is just a zip file!  I mean, if that worked for Quake 3, Quake 4, and Doom 3, surely LZMA should work for us?
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Alternative to VP-files - compressed archives (read before freaking out)
If it's at all easy to do, why not implement it and do a test run using the new format to find any problems and compare?

 

Offline Halleck

  • 24
Re: Alternative to VP-files - compressed archives (read before freaking out)
An aside, the Ur-Quan Masters project uses zip in the same way as we use VP's. They were getting a lot of confused people in their support forum who downloaded one of their enhanced content modules, saw it was a zip, extracted it, and were confused as to why the content wasn't loading.

The solution they found was to rename the file extension to .uqm. That way, random people didn't just open them up, but they could be easily opened with any archive utility by people in the know. Just a thought in case we have the same issue down the line, we could use the extension .fs2 instead of .7z.

Edit: I didn't read the whole thread but I see above that this has already been brought up. In any case, this is a solution that is known to work well for other games (including quake, apparently!)
« Last Edit: June 10, 2011, 05:18:57 am by Halleck »