#2472 new defect

encrypted cloud database

Reported by: zooko Owned by: daira
Priority: normal Milestone: undecided
Component: unknown Version: 1.10.1
Keywords: Cc:
Launchpad Bug:

Description

Tahoe-LAFS does a reasonable job of flat-file storage, and of directories structure. But, kids these days (for the last few decades, I mean) are really into structured storage, i.e. relational databases, queriable nosql databases, etc.

Here's a proposal for a stab at a "Minimum Viable Product" for an end-to-end encrypted cloud database. It's extremely simple: store a sqlite db in Tahoe-LAFS.

This would immediately give off-site storage (possibly even peer-to-peer if the underlying Tahoe-LAFS grid is a peer-to-peer grid), erasure-coding for redundancy, and it would also immediately give Tahoe-LAFS's nice access-control semantics: you can give people read-only access to your sqlitedb.

A potentially interesting use for this would be to store Tahoe-LAFS caps in the sqlitedb so that you can query them out. ☺

There are a few important details about how to map sqlite's storage needs to Tahoe-LAFS's storage offerings for best performance and to retain Tahoe-LAFS's guarantees about access control and atomicity and so forth. I looked into it at one point about a year ago, and unfortunately didn't post notes to the trac so I don't remember precisely, what I decided, but I think it was that the sqlitedb should be in write-ahead-logging WAL mode (https://www.sqlite.org/wal.html), and with exclusive locking mode, and should be stored a single MDMF file with its segment size set to be the same as the sqlitedb's page size.

The -wal file should probably also be an MDMF, although it would be cool if sqlite happened to use it in write-once mode, in which case maybe it could be an immutable.

There's an open issue about whether read-only access to such a DB would work without PRAGMA journal_mode=DELETE. Read https://www.sqlite.org/wal.html#readonly to see what I mean, and keep in mind that because we're telling the user that they have to set exclusive locking mode: https://www.sqlite.org/wal.html#noshm

With this setup, the cap to the database has to be a cap to the directory containing the sqlitedb file, not a cap to the sqlitedb file itself. That's because sqlite needs to access the -wal file adjacent to the sqlitedb file itself.

A different approach would be to use the older rollback-log functionality of sqlite instead of WAL. The trade-offs listed in https://www.sqlite.org/wal.html make it sound like maybe that would fit better into Tahoe-LAFS. It might require experimentation and benchmarking to understand.

But also it requires careful study of things like https://www.sqlite.org/lockingv3.html#how_to_corrupt and https://www.sqlite.org/atomiccommit.html#sect_9_0 to figure out if Tahoe-LAFS could provide the guarantees that sqlite needs. I think we can, and I tentatively think that the WAL is easier to guarantee than the rollback journal, because with the rollback journal there is a positive requirement to preserve and make available any hot journal, or else corruption can result, whereas with a WAL a failure of preservation or availability of the -wal just results in rollback, not corruption. I think.

Change History (0)

Note: See TracTickets for help on using tickets.