Author Topic: aaaaaaaaaaaargh (7zip rant)  (Read 8615 times)

0 Members and 1 Guest are viewing this topic.

Offline Goober5000

  • HLP Loremaster
  • 214
    • Goober5000 Productions
aaaaaaaaaaaargh (7zip rant)
As those of you who have SCP SVN access may know, I'm currently working on a rewrite of the FSO Installer.  One of the high priority feature requests is 7zip support.

Unfortunately, the 7zip SDK is atrocious.  No in-line source comments, very little documentation, and all you get is a Decoder and an Encoder class.  That only works if you have a simple stream of bytes you want to compress using LZMA, and it's useless if you have a 7zip container with a bunch of files.

That's not the subject of the rant though, because some kind and benevolent soul figured out how to use the JNI (Java Native Interface) to wrap the actual 7zip application libraries using Java classes.  It's here at 7-Zip-JBinding, and it has bindings available for all the major platforms - Windows, OSX, and Linux.  Which is A-1 SUPAR.

The problem I have is that 7-zip puts the table of contents AT THE END OF THE FILE.  Good old vanilla Zip puts it at the beginning.  This means that you have to seek to the end of the file before you know what the file contains.

Turey was clever enough to figure out that Java allows you to unzip regular Zip files on-the-fly, which is to say that you're downloading and extracting in the same operation.  And you're only download the out-of-date files.  This is extremely handy because downloading a stream of bytes from the internet is a strictly sequential operation.

Guess what... 7zip was apparently designed on the assumption that random-access would be the norm.  So not only does 7zip require you to seek to the end of the file, it seeks back and forth a lot once it's there.  Unfortunately, in a forward-only stream, seeking backwards means that you have to start reading all over again and seek forward to that location from the beginning.

Using predictive buffering and some smart seeking strategies, I was able to smooth everything out so that only one seek, from beginning to end, is required to get all the necessary information.  Unfortunately, that one seek is necessary due to the inherent design of 7zip; it's as if you have to download the file before you can download the file.  On the 7.4 MB file I used for testing, that takes about 12 seconds.


So that's the current story.  There are several paths I can take from here, and I'd like to know what people would prefer:

1) Drop 7zip support, and enforce Zip only.  (I'm not seriously considering this option.  First, I've already done a lot of work on it; second, 7zip allows the extraction of not only .7z files but also .rar, .bzip, .gzip, and a bunch of others.)

2) Leave the behavior as it is, and tell people they'll just have to live with the delay that occurs prior to downloading.

3) Change the behavior so that the installer will always download 7zip files before checking whether they need to be extracted.  I'm not sure that this will save any time because in both cases the installer has to first move across the internet connection from start to finish, and then (optionally) extract something from it.  The only significant difference is that in the current case the extraction happens over an internet connection, whereas in this case it would occur on the local hard drive, saving bandwidth.  However, the user would have to live with a potentially large, potentially unnecessary file being downloaded to, and immediately deleted from, a temporary folder.
« Last Edit: February 28, 2010, 11:45:45 pm by Goober5000 »

 

Offline Bobboau

  • Just a MODern kinda guy
    Just MODerately cool
    And MODest too
  • 213
Re: aaaaaaaaaaaargh (7zip rant)
I would say 3 or 2.
Bobboau, bringing you products that work... in theory
learn to use PCS
creator of the ProXimus Procedural Texture and Effect Generator
My latest build of PCS2, get it while it's hot!
PCS 2.0.3


DEUTERONOMY 22:11
Thou shalt not wear a garment of diverse sorts, [as] of woollen and linen together

 
Re: aaaaaaaaaaaargh (7zip rant)
This is seriously a case of design THEN code (and coding should take < 50% of the total time).
It's also a case of not understanding what the format is supposed to be doing.

I suggest 1. The other two are hacks. Just because you've spent a lot of time coding something doesn't mean it shouldn't be dropped. It just means you didn't think ahead, and worked with features rather than design.
STRONGTEA. Why can't the x86 be sane?

 

Offline headdie

  • i don't use punctuation lol
  • 212
  • Lawful Neutral with a Chaotic outook
    • Minecraft
    • Skype
    • Twitter
    • Headdie on Deviant Art
Re: aaaaaaaaaaaargh (7zip rant)
does this re-reading effect how much data is being sent down the connection?
Minister of Interstellar Affairs Sol Union - Retired
quote General Battuta - "FRED is canon!"
Contact me at [email protected]
My Release Thread, Old Release Thread, Celestial Objects Thread, My rubbish attempts at art

 

Offline Iss Mneur

  • 210
  • TODO:
Re: aaaaaaaaaaaargh (7zip rant)
Personally I am not entirely sure why this is an issue.  That is, why do you need to read the TOC of the archive before you download the the rest of the archive? 

The only answer that I can see, and that Goober5000 sort of hints at is that the Installer is using the zip's TOC as a sort of manifest to see what files have changed. 

This strikes me as a clever hack that could be very problematic as it is based on the fact that the stored file modified date and/or file size will change when the file contents changes.  Unfortunately this is not always true.  I have personally run into zip archives that do not have any file times set. (though I can't say that I have noticed this with any FS related files, if only because I have not looked).

I have to agree with Zacam's suggestion on #scp when Goober5000 brought this thread up, about external manifests.  I think the external manifests would be a better solution to the various inadequacies of the different archive formats.  It would also allow for other improvements to the mod system that have been proposed over the last few months.

That being said, you could do number 1 and just not support .7z itself if you want to have other file formats, though as we talked about on #scp the tar based formats would have a similar issue because of the TOC information being interspersed with the files they would require downloading the entire file as well.
"I love deadlines. I like the whooshing sound they make as they fly by." -Douglas Adams
wxLauncher 0.9.4 public beta (now with no config file editing for FRED) | wxLauncher 2.0 Request for Comments

 

Offline Mika

  • 28
Re: aaaaaaaaaaaargh (7zip rant)
I think this is a case of #1. Building stuff on top of that is going to be a shaky at best and might become impossible later. After seeing this, I wonder no more why 7ZIP never became popular, as RAR and ZIP still seem to dominate.

The old software wisdom goes sometimes it is simply better to abandon the old stuff despite the work that has been done for it. And start from the scratch and rebuild it for a better standard, as in the end it usually turns out you still have saved time and lots of headaches. Of course, this being a rather small thing it might not turn out to be so, but you can also think that the work you did will help you to do accomplish something related in real life work.

I think you are lucky as this is more like a hobby project. I have managed to waste thousands of euros for trying to construct something on top of a software that ultimately turned out to be incapable of doing what we supposed it could do. And it was really at my work. Not a very happy feeling once I realized it, but we overcame it.

Good luck
Relaxed movement is always more effective than forced movement.

 

Offline Goober5000

  • HLP Loremaster
  • 214
    • Goober5000 Productions
Re: aaaaaaaaaaaargh (7zip rant)
This is seriously a case of design THEN code (and coding should take < 50% of the total time).
It's also a case of not understanding what the format is supposed to be doing.

I suggest 1. The other two are hacks. Just because you've spent a lot of time coding something doesn't mean it shouldn't be dropped. It just means you didn't think ahead, and worked with features rather than design.
You've managed to completely miss the entire problem.

This is not a case of design-then-code, this is a case of taking two incompatible designs and trying to find the best way to make them work together.  The Java stream API is well documented, extensively tested, and heavily used in all manner of applications.  The canonical FSO Installer code accesses and downloads the contents of Zip files using a standard algorithm, one which is also heavily used, and one for which the Zip file format is well suited.  It's so well suited, in fact, that Zip support is included in the core JDK.  The problem at hand is how to take the idiosyncratic 7Zip API and make it work with the established standard.

Now I incorrectly stated that the Zip format puts the table of contents at the beginning; in fact, it puts it at the end, just like 7Zip.  So if it's possible to access the contents of a Zip file sequentially, using the local file headers instead of the TOC, then the same should be true of 7Zip.  Unfortunately, the 7Zip API doesn't provide a method for doing this.  If the API could be enhanced or modified, then this would provide a fourth solution to the problem.

And you ought to know that I've happily dropped projects that I've worked on for much longer than this.  It should be clear from my post that the reason I want to keep 7Zip is because of the support it offers for different file formats, not because of the time I spent working on it.


does this re-reading effect how much data is being sent down the connection?
It depends on the server.  But in general, seeking to the end of the file counts as one read, and extracting the file counts as another read.  So you're traversing the file twice.


Personally I am not entirely sure why this is an issue.  That is, why do you need to read the TOC of the archive before you download the the rest of the archive?
You don't, as I learned today.  But 7Zip reads it anyway, as soon as you open the archive, and doesn't provide a method to skip this step.

Quote
The only answer that I can see, and that Goober5000 sort of hints at is that the Installer is using the zip's TOC as a sort of manifest to see what files have changed.

This strikes me as a clever hack that could be very problematic as it is based on the fact that the stored file modified date and/or file size will change when the file contents changes.  Unfortunately this is not always true.  I have personally run into zip archives that do not have any file times set. (though I can't say that I have noticed this with any FS related files, if only because I have not looked).
Actually, the installer checks the file size, not the modification date.  I was skeptical about this strategy too, but it has worked for several years without problems.


Quote
I have to agree with Zacam's suggestion on #scp when Goober5000 brought this thread up, about external manifests.  I think the external manifests would be a better solution to the various inadequacies of the different archive formats.  It would also allow for other improvements to the mod system that have been proposed over the last few months.

That being said, you could do number 1 and just not support .7z itself if you want to have other file formats, though as we talked about on #scp the tar based formats would have a similar issue because of the TOC information being interspersed with the files they would require downloading the entire file as well.
External manifests is a good idea, but it causes problems with maintenance and reverse compatibility.  It also might be redundant, especially if this API problem can be solved.

And judging by the sense of the forum, 7Zip support is pretty much a requirement in any Installer upgrade.

 
Re: aaaaaaaaaaaargh (7zip rant)
I still don't get why 7Zip is a requirement - just because popular consensus says 'we want it' doesn't mean they can have it.
It's an inappropriate technology for the task. That is why #1 would be the only acceptable solution for high-quality software.

Additionally, you know as well as I do that file sizes are absolutely inappropriate for this use.
« Last Edit: March 02, 2010, 01:14:41 am by portej05 »
STRONGTEA. Why can't the x86 be sane?

 

Offline Goober5000

  • HLP Loremaster
  • 214
    • Goober5000 Productions
Re: aaaaaaaaaaaargh (7zip rant)
I still don't get why 7Zip is a requirement - just because popular consensus says 'we want it' doesn't mean they can have it.
While this is true in principle (take GeoMod for example), this is a far cry from such an extreme case.  First of all, 7Zip support is, if not the very top, one of the top three requested features for the installer upgrade.  Second of all, 7Zip use is widespread on the forum and regularly used for thread downloads; it would be highly inconvenient for a project to maintain both a preferred 7Zip download and a Zip download for the installer.  Third of all, 7Zip support doesn't merely enable extraction of .7z files, it also enables extraction of .rar and a ton of additional formats, all in one single library.  (A couple of years ago, RAR was as popular on the forum as 7Zip is now.)  Fourth of all, and most importantly, 7zip extraction actually works now; it's just somewhat inconvenient.

Quote
It's an inappropriate technology for the task. That is why #1 would be the only acceptable solution for high-quality software.
This is, I think, another case where you're focusing entirely on the academic and theoretical arguments, and ignoring or dismissing the practical requirements.  It's basically a recipe for shooting yourself in the foot.  Don't worry, experience will help with that. :)

Quote
Additionally, you know as well as I do that file sizes are absolutely inappropriate for this use.
As I said above, it has worked for many years.  There have been absolutely no complaints about the installer spuriously redownloading files based on an incorrect file size comparison.  So while it's something to keep in mind, it's an extremely low priority.

 

Offline Fury

  • The Curmudgeon
  • 213
Re: aaaaaaaaaaaargh (7zip rant)
7Zip is far superior in compression than zip or other older algorithms. Where zip compressed BP would be 451 MB, 7z is only 357 MB. I for one, will never make zip-downloads for Blue Planet simply because the large difference in download sizes. 7z is more convenient for both uploader and downloader.

I really don't care at all about 7z's inability to extract on the fly. Time saved in downloading significantly smaller files is bigger than downloading it all first and only then extracting. HDD space is much less of an issue than internet speed.

Zip sucks as compression algorithm for binary files.

 

Offline Aardwolf

  • 211
  • Posts: 16,384
    • Minecraft
Re: aaaaaaaaaaaargh (7zip rant)
I still don't get why 7Zip is a requirement - just because popular consensus says 'we want it' doesn't mean they can have it.
While this is true in principle (take GeoMod for example), this is a far cry from such an extreme case.  First of all, 7Zip support is, if not the very top, one of the top three requested features for the installer upgrade.  Second of all, 7Zip use is widespread on the forum and regularly used for thread downloads; it would be highly inconvenient for a project to maintain both a preferred 7Zip download and a Zip download for the installer.  Third of all, 7Zip support doesn't merely enable extraction of .7z files, it also enables extraction of .rar and a ton of additional formats, all in one single library.  (A couple of years ago, RAR was as popular on the forum as 7Zip is now.)  Fourth of all, and most importantly, 7zip extraction actually works now; it's just somewhat inconvenient.

I agree... the launcher should feature GeoMod.  :drevil:

 

Offline chief1983

  • Still lacks a custom title
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: aaaaaaaaaaaargh (7zip rant)
Since when did this SDK support other compression formats beside LZMA?

And I would much rather see the launcher not rely on file sizes.  What about checking the files for corruption?  A flipped bit won't change the file size but it'll corrupt the entire thing.  Hashing is a requested change right up there with getting rid of file size only verification.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline Goober5000

  • HLP Loremaster
  • 214
    • Goober5000 Productions
Re: aaaaaaaaaaaargh (7zip rant)
This new feature doesn't use the SDK (which is next to useless); it uses a Java binding of the entire 7Zip extraction mechanism.  So it can extract anything 7Zip can.

 

Offline chief1983

  • Still lacks a custom title
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: aaaaaaaaaaaargh (7zip rant)
So you're including a chunk of 7-zip itself in the installer then?  Ok.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 
Re: aaaaaaaaaaaargh (7zip rant)
Don't forget that 7zip is LGPL/unRAR, so not only will you need to make sure you use the dynamic library, you will also have to distribute it.

You're not thinking about the whole ecosystem architecture. I'd also take this opportunity to point out that if you're going the direction I think you're going with this, you're introducing a rather large security vulnerability into FSO (and we _KNOW_ that code execution vulnerabilities have been found in FSO).
STRONGTEA. Why can't the x86 be sane?

 

Offline Goober5000

  • HLP Loremaster
  • 214
    • Goober5000 Productions
Re: aaaaaaaaaaaargh (7zip rant)
You continue to make unsupported assumptions.  I'm well aware of the licensing issues; the FSO Installer will be (and is already, actually) licensed under the GPL, and its source code will be (and is already) freely downloadable.

The FSO Installer is not the same application as FSO itself.  It's not even written in the same language as FSO.

 

Offline chief1983

  • Still lacks a custom title
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: aaaaaaaaaaaargh (7zip rant)
Instead of using the 7-zip header why not store the info we need about the files elsewhere?  Using some hashing and a table or something.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

  

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: aaaaaaaaaaaargh (7zip rant)
Instead of using the 7-zip header why not store the info we need about the files elsewhere?  Using some hashing and a table or something.

And; is it possible to download just the last part of a file first and then read it?  I know download managers can resume files, so why not get the file size, download the last xx bytes (I don't really know if this is feasible, so please forgive me if it's out of the question), and read the TOC?

 

Offline chief1983

  • Still lacks a custom title
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
Re: aaaaaaaaaaaargh (7zip rant)
It's probably as feasible as how it was being done.  But the actual file structure of the rest of the archive might still be a problem.  You can't just get one file out of a 7-zip.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 

Offline Galemp

  • Actual father of Samus
  • 212
  • Ask me about GORT!
    • Steam
    • User page on the FreeSpace Wiki
Re: aaaaaaaaaaaargh (7zip rant)
I feel your pain. Just slap a "Verifying contents of package..." progress indicator on it and let it take as long as it likes.
"Anyone can do any amount of work, provided it isn't the work he's supposed to be doing at that moment." -- Robert Benchley

Members I've personally met: RedStreblo, Goober5000, Sandwich, Splinter, Su-tehp, Hippo, CP5670, Terran Emperor, Karajorma, Dekker, McCall, Admiral Wolf, mxlm, RedSniper, Stealth, Black Wolf...