Exploring Time Machine's HFS+ Filesystem

There are situations where hardware fails. It's not a big issue, when it's a GPU or RAM, we'd usually swear a little, then go to the store and buy another. Hard disks, however, are another story. The dreaded possibility of loss of data makes us duplicate the data over disks, RAIDs, network if possible - the more the better. After all, having a backup saves us from the inevitable.

And there are situations when backup fails, too. The initial shock makes everyone silent. And then the first person starts to sob. My data, where's my data? - he cries. More and more people join the cry, and then the whole world collapses. Crime rate raises rapidly. People leave their houses wide open and run in amok. Brutality and agression spreads through the whole population. No one is safe now. Not even wealth could spare you from the devastating tragedy of broken backup storage.

I've been there. My laptop broke. I've lost about a month of data which I did not backup since last time, but I believed the backup disk is healthy. It didn't. The disk was still accessible, but it had some critical bad sectors to the point that Spotlight did reset the disk every time it tried to index data on it. Since my Mac was not in the mood to do anything to repair the disk, I needed to find another way.

What have I learned

At first, try to work out something with Mac. At beginning I've got to the core of the problem why my disk is spinning down every 30 seconds after spinning up. This problem was only present on Mac. I've tried to read the disk on Ubuntu and there were no problems, so I came to conclusion that my Mac OS X tries to read some file from a bad sector.

I typed

sudo dmesg

and voila - all the debug messages pointed to Spotlight repetitevely trying to update its index and read something from a disk. And because it's a really good idea not to read too much unnecessary data from an already broken disk, I've quickly found how to disable this.

There is a question with a very good answer how to do it. There are two ways, I've chosen

sudo mdutil -a -i off

The corrupted disk seemed to block on this, but after removing it, running the command and plugging it once more I've found no more spindowns. One problem solved, but there were more to go.

Disk Utility didn't even want to touch my disk. All it said it's broken beyond repairing. I tried running it under Ubuntu. It did read the filesystem and seemed to be less picky about the disk. I tried to run fsck.hfsplus on the partition, but it told me that I should disable journaling before doing this. By the way, to run fsck on HFS+ you need to install hfsprogs:

sudo apt-get install hfsprogs

Back on Mac, there was a command to disable journaling. Be sure that you run it on the affected partition.

sudo diskutil disableJournal /dev/disk1s1

And it returned some funny error:

An error occurred journaling the file system: The underlying task
reported failure on exit (-69860)

Some sources said you could ignore that error, it still disables journaling. Not in my case. Well, is there another option? There is. And man, it's hackish.

(From now one I won't write sudo everywhere. Pretty much every command needs to be ran as root or something).

I've compiled it on my mac. And ran.

$ gcc disable_journal.c
$ ./a.out /dev/disk1s1
Segmentation fault 11
$

YEAH. Sounds like fun. Gave it another try:

$ brew install homebrew/dupes/gdb
$ gcc -g disable_journal.c
$ gdb a.out
(gdb) break main
(gdb) run /dev/disk1s1
(gdb) step
...

I've found that the mmap(2) does not work on my device, then a check is written incorrectly (my system is 64-bit), so it segfaults.

On the aforementioned page, the first comment addresses this. The author said that we can't mmap this device, so he suggested using dd, which I tried afterwards.

$ dd if=/dev/disk1s1 of=lol bs=512 count=4
Input/Output error

Great. Just great. Apparently I can't do anything on my Mac. So I switched to Ubuntu. The code has been ported and described as working on this thread.

$ dd if=/dev/sdb1 of=lol bs=512 count=4
$ ./a.out lol
$ dd if=lol of=/dev/sdb1 bs=512 count=4

It worked! And disabled journaling as expected, because fsck.hfsplus did not complain either

$ fsck.hfsplus /dev/sdb1

Weird - it checked everyting and reported OK. I've quickly checked if there are other options that could help me. There was one for rebuilding the catalogue B-tree, so, out of curiosity I checked it:

$ fsck.hfsplus -r /dev/sdb1

After rebuilding it did find some errors in the structure, recovered from that and reported OK.

The truth behind Time Machine

So I started to rsync the backups

$ rsync /media/EVA00/Backups.backupdb/Latest recovered-backup
Input/Output error

Fortunately, the previous backup from an hour before Latest was working correctly...

$ rsync /media/EVA00/Backups.backupdb/2012-xx-xx recovered-backup
...
Users/Macbook/Music

WAIT. Why there are no files under Music directory? Why only so many files? And why is the Music an empty file and not a directory?

And what is this HFS+ Private Directory Data\r with A LOT of directories with my data inside? Is this some post-recovery HFS+ equivalent of lost+found directory in Linux? I searched the Internet and read some explanations. This one was particually good. So the HFS+ did made a hack. A very dirty hack. They use a magic directory to store information about hardlinked files and directories. And pretty much everything is hardlinked in Time Machine.

I would care not, but for some reasons, my Ubuntu did not support that magic and couldn't resolve links properly, leaving some broken in middle structure to handle manually. And it would be a pain in the ass to resolve manually. There were like about 90 000 directories in the Private Directory Data. However, there is hope.

This Gist is a powerful machine that restores real file hierarchies from Time Machine backups on Linux. I've found two necessary changes to be made:

  1. Because of disk read failures, I've removed the set -e line from the script.
  2. I've set LANG=C to stat invocation, because it returned localized strings that were unrecognizable by the script.

After that, I've tried to restore some backups to another disk on my Ubuntu. It finished successfully, copied a whole bunch of files, and my data is once more secured. This time on RAID 1 setup. :)