Author Topic: Crash resistant file systems  (Read 1935 times)

0 Members and 1 Guest are viewing this topic.

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Crash resistant file systems
http://www.techrepublic.com/article/crash-resistant-data-storage-is-now-possible-say-mit-researchers/

Quote

By Michael Kassner | September 16, 2015, 5:56 AM PST

ERROR: NTFS FILE SYSTEM (0x00000024) and the ensuing Blue Screen of Death (BSOD) can mean many things, usually the most painful being lost data. Thankfully, operating and file systems have improved over the years, and BSODs are less common; however, they're still around and causing angst.

After a file-system crash, think how much better it would be to see "You must restart your computer. But, all of your data has been saved."

A Massachusetts Institute of Technology (MIT) press release by Larry Hardesty stated researchers at the university will present a paper on October 4, 2015 at the ACM Symposium on Operating Systems Principles about file-system crashes. The presentation should be of interest to everyone who uses a computer. It's about the world's first file system that's mathematically guaranteed not to lose track of data during a crash.

"Although the file system is slow by today's standards, the techniques the researchers used to verify its performance can be extended to more sophisticated designs," added Hardesty.

Since operating systems and file systems crash, to avoid confusion, let's delineate between the two. Operating systems control hardware and software resources; file systems identify, store, and retrieve data.

The MIT research team focused on crashes affecting the file system. Nickolai Zeldovich, one of the three making the October presentation and an associate professor of computer science and engineering at MIT told Hardesty, "What many people worry about is building these file systems to be reliable, both when they're operating normally but also in the case of crashes, power failures, software bugs, hardware errors, and what have you."

Hardesty mentioned that figuring out how to recover from a file-system crash is complicated. There are so many different places the code could fail. "You have to consider every instruction or every disk operation and think what if I crash now...," explained Zeldovich. "So empirically, people have found lots of bugs in file systems that have to do with crash recovery, and they keep finding them, even in very well tested file systems, because it's just so hard to do."

That said, the researchers feel they have a solution. Zeldovich along with Frans Kaashoek, professor of electrical engineering and computer science (EECS), Adam Chlipala, associate professor of computer science, Haogang Chen, a graduate student in EECS, and Daniel Ziegler, an undergraduate in EECS, used formal verification to prove their file system does not lose track of data during a crash.

Simply put, formal verification is the process of checking whether a design satisfies specified requirements (properties), and, in this case that means not losing data. "It's a complicated process, so it's generally applied only to very high-level schematic representations of a program's functionality," wrote Hardesty. "Translating high-level schema into working code, however, can introduce myriad complications that the proofs don't address."

To avoid those pitfalls, the MIT researchers ran a formal verification on the file system's final code, which is much more difficult. "This formal proving environment includes a programming language," Chlipala explained. "So we implement the file system in the same language where we're writing the proofs. And the proofs are checked against the actual file system, not some whiteboard idealization that has no formal connection to the code."

Because of their stipulations, the team needed to input much more data than typically required — for example, cataloging all the components of the file system, and then describing all component relationships that occur during a file-system crash. "In the course of writing the file system, they (researchers) repeatedly went back and retooled the system specifications," said Hardesty. "In the course of writing the file system, they (researchers) repeatedly went back and retooled the system specifications," said Hardesty. "But even though they rewrote the file system 'probably 10 times,' Zeldovich says, Kaashoek estimates they spent 90 percent of their time on the definitions of the system components and the relationships between them."

Researcher Kaashoek summed it up by saying, "No one had done it. It's not like you could look up a paper that says, 'This is the way to do it.' But now you can read our paper and presumably do it a lot faster."

Diving deeper into what the MIT team accomplished would be above my pay grade. However, those who understand what this means have taken notice. Hardesty talked to Ulfar Erlingsson, lead manager of security research at Google. "It's not like people haven't proven things in the past," noted Erlingsson. "But usually the methods and technologies, the formalisms that were developed for creating the proofs, were so esoteric and so specific to the problem that there's hardly any chance there would be repeat work built on it."

Erlingsson added, "But I can say for certain... this is stuff that's going to get built on and applied in many different domains. That's what's so exciting."

 

Offline The E

  • He's Ebeneezer Goode
  • 213
  • Nothing personal, just tech support.
    • Steam
    • Twitter
Re: Crash resistant file systems
It's a paper I am really looking forward to reading (even though I will probably not comprehend it :P). I am particularly interested in seeing what kinds of failures this thing is actually able to protect against; I find it hard to believe that sudden catastrophic power failure is a problem that can be solved in software.....
If I'm just aching this can't go on
I came from chasing dreams to feel alone
There must be changes, miss to feel strong
I really need lifе to touch me
--Evergrey, Where August Mourns

 

Offline zookeeper

  • *knock knock* Who's there? Poe. Poe who?
  • 210
Re: Crash resistant file systems
I find it hard to believe that sudden catastrophic power failure is a problem that can be solved in software.....

Why couldn't it? For example, no matter what you're writing to the disk, you can always make it atomic so that you can never end up in an inconsistent state no matter what point the power failure occurs at.

EDIT: Except if for example the hardware claims that it's written the data already but it's actually still merely cached/buffered, which I presume is very common.
« Last Edit: September 17, 2015, 09:44:20 am by zookeeper »

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Crash resistant file systems
It shouldn't claim that until the cache gets flushed, you would think... :ick:

Press release link has some interesting comments and more details than that article:

http://news.mit.edu/2015/crash-tolerant-data-storage-0824

 

Offline Ghostavo

  • 210
  • Let it be glue!
    • Skype
    • Steam
    • Twitter
Re: Crash resistant file systems
I thought journaling had solved this issue decades ago.
"Closing the Box" - a campaign in the making :nervous:

Shrike is a dirty dirty admin, he's the destroyer of souls... oh god, let it be glue...

  
Re: Crash resistant file systems
This basically sounds like they wrote a journalling filesystem in Coq. They haven't actually invented any new techniques to avoid data loss, they've just implemented the existing ones and proven their correctness.
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Crash resistant file systems
I thought it was more along the lines of if the failsafes in the journaling fail (??), the data would still be safe?  Not sure exactly though, I just assumed that they weren't talking about re-inventing the wheel.  More like inventing run-flat tires when tires are all puncture-resistant anyways?  ¯\_(ツ)_/¯

 
Re: Crash resistant file systems
It's more like mathematically proving that your tyres really are puncture-resistant.
The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell.

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Re: Crash resistant file systems
Ah, I see.  Well, that's good to know I suppose... why wasn't that done when they first came out with journaling filesystems??

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
Re: Crash resistant file systems
Ah, I see.  Well, that's good to know I suppose... why wasn't that done when they first came out with journaling filesystems??
Because math is hard.
Quote
"It's a complicated process, so it's generally applied only to very high-level schematic representations of a program's functionality," wrote Hardesty.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.