#1228 new enhancement

backupdb and ext4 i_version/generation xattributes

Reported by: warner Owned by: warner
Priority: normal Milestone: undecided
Component: code-encoding Version: 1.8.0
Keywords: performance Cc:
Launchpad Bug:

Description

I recently learned that several linux filesystems can track version/generation numbers for local files. We could use this information in the backupdb to improve the speed+reliability of detecting files that have not been modified since the last time we did a backup.

lsattr -v FOO.txt shows a "version/generation number", and probably works for even old ext2 filesystems.

this and this talk about "mounting a filesystem with i_version support", and suggests that the following ext4 extended-attributes will become available:

  • file.crtime - actual file creation time
  • file.i_generation - inode generation number
  • file.i_version ("directories only") - inode data version number

It's not yet clear to me what information is really available, or how one might get to it (especially from python), but this ticket is to remind me that "tahoe backup" would be a lot better if we could quickly and reliably determine that a file had not changed. The filesize+timestamp heuristic is useful, but it'd be nice to be able to do better. A real generation number would be ideal, if the kernel promises to update it reliably. A kernel-maintained hash of the filesystem contents would be great too (it would let us detect renames without reading the file contents).

Change History (4)

comment:1 in reply to: ↑ description Changed at 2010-10-20T04:01:42Z by davidsarah

Replying to warner:

It's not yet clear to me what information is really available, or how one might get to it (especially from python), [...]

The patch you linked to would have made this info accessible via "extended attributes" for ext4 filesystems, but it doesn't look like that patch was accepted.

In general, anything you can do from C, you can do from Python using ctypes (if the FFI overhead is not an issue, as it probably isn't in this case).

comment:2 follow-up: Changed at 2010-10-25T18:23:37Z by randombit

It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer. It's not clear to me where the values reported on XFS and ZFS filesystems are coming from (it's not the inode number, at least). Trying to set the values (using chattr -v) also fails on these filesystems.

I like the idea of a kernel maintained per-file hash cache. I may play with this in my copious free time.

comment:3 in reply to: ↑ 2 Changed at 2010-10-26T01:30:08Z by davidsarah

Replying to randombit:

It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer.

We could potentially test whether this attribute is working correctly in a given directory subtree, by updating a test file and seeing whether it changes.

comment:4 Changed at 2012-04-01T05:05:17Z by davidsarah

  • Priority changed from major to normal
Note: See TracTickets for help on using tickets.