[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2472: encrypted cloud database

Tahoe-LAFS trac at tahoe-lafs.org
Tue Jul 21 18:53:02 UTC 2015


#2472: encrypted cloud database
---------------------+---------------------------
 Reporter:  zooko    |          Owner:  daira
     Type:  defect   |         Status:  new
 Priority:  normal   |      Milestone:  undecided
Component:  unknown  |        Version:  1.10.1
 Keywords:           |  Launchpad Bug:
---------------------+---------------------------
 Tahoe-LAFS does a reasonable job of flat-file storage, and of directories
 structure. But, kids these days (for the last few decades, I mean) are
 really into ''structured storage'', i.e. relational databases, queriable
 nosql databases, etc.

 Here's a proposal for a stab at a "Minimum Viable Product" for an end-to-
 end encrypted cloud database. It's extremely simple: store a sqlite db in
 Tahoe-LAFS.

 This would immediately give off-site storage (possibly even peer-to-peer
 if the underlying Tahoe-LAFS grid is a peer-to-peer grid), erasure-coding
 for redundancy, and it would also immediately give Tahoe-LAFS's nice
 access-control semantics: you can give people read-only access to your
 sqlitedb.

 A potentially interesting use for this would be to store Tahoe-LAFS caps
 in the sqlitedb so that you can query them out. ☺

 There are a few important details about how to map sqlite's storage needs
 to Tahoe-LAFS's storage offerings for best performance and to retain
 Tahoe-LAFS's guarantees about access control and atomicity and so forth. I
 looked into it at one point about a year ago, and unfortunately didn't
 post notes to the trac so I don't remember precisely, what I decided, but
 I think it was that the sqlitedb should be in write-ahead-logging `WAL`
 mode (https://www.sqlite.org/wal.html), and with exclusive locking mode,
 and should be stored a single MDMF file with its segment size set to be
 the same as the sqlitedb's page size.

 The `-wal` file should probably also be an MDMF, although it would be cool
 if sqlite happened to use it in write-once mode, in which case ''maybe''
 it could be an immutable.

 There's an open issue about whether read-only access to such a DB would
 work without `PRAGMA journal_mode=DELETE`. Read
 https://www.sqlite.org/wal.html#readonly to see what I mean, and keep in
 mind that because we're telling the user that they have to set exclusive
 locking mode: https://www.sqlite.org/wal.html#noshm

 With this setup, the cap to the database has to be a cap to the directory
 ''containing'' the sqlitedb file, not a cap to the sqlitedb file itself.
 That's because sqlite needs to access the `-wal` file adjacent to the
 sqlitedb file itself.

 A different approach would be to use the older rollback-log functionality
 of sqlite instead of WAL. The trade-offs listed in
 https://www.sqlite.org/wal.html make it sound like maybe that would fit
 better into Tahoe-LAFS. It might require experimentation and benchmarking
 to understand.

 But also it requires careful study of things like
 https://www.sqlite.org/lockingv3.html#how_to_corrupt and
 https://www.sqlite.org/atomiccommit.html#sect_9_0 to figure out if Tahoe-
 LAFS could provide the guarantees that sqlite needs. I think we can, and I
 tentatively think that the `WAL` is easier to guarantee than the rollback
 journal, because with the rollback journal there is a positive requirement
 to ''preserve'' and make ''available'' any hot journal, or else corruption
 can result, whereas with a `WAL` a failure of preservation or availability
 of the `-wal` just results in rollback, not corruption. I think.

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2472>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list