EDIT: Here's the TL;DR of the rest of the thread so far (4/25/11).
We don't want to use a new, proprietary format, or a modified existing format, because that would require the addition of tools that we'd have to maintain. It's basically been decided that non-solid 7z (LZMA) would be the best idea for this format, as 7-zip already exists on all supported platforms for editing archives, and the SDK is open source and compatible with FSO. No further need to discuss this really, we just need to implement it now.
Chief
--Now here's the rest of the original post--
There was once discussion about allowing support for other file formats than Volition's vp file format for game assets. Obviously the usual suspects of zip, rar, 7z all popped up. They all offer the option to use variety of different compression options or no compression at all. Plus they're easier to manage than vp-files regardless of the vp-managers we have. Would there be other advantages? Certainly, but let's get back to those later.
I decided to perform some tests with data that is very relevant to our use scenario. Compression tests were done with exported MV_Advanced, MV_AnimGlows, MV_Assets, MV_CB_ANI_1, MV_CB_ANI_2, MV_Effects, MV_Intel_ANI, MV_Music, MV_radaricons and MV_Root from FSU SVN. Compression was done using 7-Zip 9.20 utilizing four threads.
For comparison, uncompressed and deflate. Deflate is the standard compression method used in ZIP archives we've all grown to love and hate.
Uncompressed 2.54 GB
Deflate Fastest 1.08 GB
Deflate Normal 1.02 GB
Deflate Ultra 1.01 GB
BZip2 is so slow that it's not even worth testing, compression ratio also less than that of PPMd and LZMA but better than deflate.
PPMd is the latest addition to ZIP, RAR and 7-Zip archivers.
PPMd Fastest 928 MB
PPMd Normal 860 MB
PPMd Ultra 840 MB
Now, let's test LZMA2 which is successor to LZMA, a popular compression algorithm used in 7-Zip.
LZMA2 Fastest 990 MB
LZMA2 Normal 844 MB
LZMA2 Ultra 837 MB
And next, otherwise identical to previous set of LZMA2 archives but these archives are not solid. In solid archives files are compressed as contiguous data blocks. This means that to extract one file from the archive, you also need to extract all other files in same data block. Size of solid data block depends on compression settings, it can be as small as 1MB or encompass the whole archive regardless of its size. Contiguous data blocks increase compression efficiency. Zip archives such as deflate compressed archives are non-solid. A non-solid archive is what you would say is requirement for "stream decompressing", or streaming.
LZMA2 Fastest 997 MB
LZMA2 Normal 887 MB
LZMA2 Ultra 885 MB
Now let's review how much RAM is needed to decompress these archives. Whether archive is solid or not has no impact on memory usage.
Deflate Fastest 2 MB
Deflate Normal 2 MB
Deflate Ultra 2 MB
PPMd Fastest 3 MB
PPMd Normal 18 MB
PPMd Ultra 130 MB
LZMA2 Fastest 3 MB
LZMA2 Normal 18 MB
LZMA2 Ultra 66 MB
Right, so that covers how small the archives are and how much RAM is needed to decompress. Compression speed is mostly irrelevant in our use scenario, so I'm skipping that. The bigger question is how fast is decompression of compressed archives? Let's try out, both archive and target directory for extracted content is on same physical hard drive which follows our typical use scenario. In this test I'm using non-solid LZMA2 archives as solid archives are unusable in our typical use scenario.
Deflate Fastest 2:16 min
Deflate Normal 2:16 min
Deflate Ultra 2:17 min
PPMd Fastest 8:53 min
PPMd Normal 11:43 min
PPMd Ultra 18:46 min
LZMA2 Fastest 2:08 min
LZMA2 Normal 2:10 min
LZMA2 Ultra 2:16 min
From these tests we can immediately rule out PPMd as usable compression algorithm for our use scenario. LZMA2 on the other hand performs admirably, beating even deflate in decompression speed. It is no wonder why some linux distros have adopted LZMA2 compression for use in their installation CD's and repositories as well as slowly gaining popularity in mobile devices for these very reasons.
And so, we have one last question remaining. Just how does LZMA2 compare to uncompressed archive? Again, let's find out!
Uncompressed 2:04 min
Are your eyes deceiving you? Certainly not. Fastest LZMA2 decompression is only 4 seconds slower, normal 6 seconds and ultra 12 seconds. And in this test scenario we decompressed 2.54 GB worth of data. The difference in speed should be lower during most mission loads.
Now we can get to the benefits. I can think of following benefits in having compressed LZMA2 archives play role of vp files:
- LZMA2 archives have pretty good checksum checks, so corrupted archives throws up an error. With vp-files you may just end up wondering wtf is going on without really realizing the cause, until JeffVader points out that your checksums in debug log aren't matching.
- Mods and campaigns can be downloaded to mod folders as-is. This is even more convenient with installers such as FSO Installer and Desura.
- Normally you'd keep both extracted vp files as well as downloaded compressed archive around, with support to such compressed archives to work like vp files, you'd only need to keep the compressed archive. This saves a lot of disk space if you have small HDD.
Nobody is forcing anyone to switch from vp files to whatever compressed archives, but it'd be real nice to have the option. Especially since the difference in speed doesn't really seem to be that much.