[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2346: cloud backend uses lots of expensive LIST requests (was: cloud backend uses losts of expensive LIST requests)

Tahoe-LAFS trac at tahoe-lafs.org
Wed Dec 3 23:00:38 UTC 2014


#2346: cloud backend uses lots of expensive LIST requests
--------------------------+------------------------------------------------
     Reporter:            |      Owner:
  cloud_trouble           |     Status:  new
         Type:  defect    |  Milestone:  1.12.0
     Priority:  normal    |    Version:  cloud-branch
    Component:  code-     |   Keywords:  cloud-backend S3 cost optimization
  storage                 |
   Resolution:            |
Launchpad Bug:            |
--------------------------+------------------------------------------------
Changes (by daira):

 * keywords:  cloud S3 cost optimization => cloud-backend S3 cost
               optimization
 * milestone:  undecided => 1.12.0


Old description:

> The cloud backend uses lots of expensive LIST requests with an Amazon S3
> bucket from heavy use of GET Bucket. The GET Bucket request is billed as
> a LIST request and is 10 times more expensive than a GET Object request.
>
> These LIST requests can be a large portion of the cost of using an S3
> backed storage node. For example, my logs show 1.5 times as many GET
> Bucket requests as GET Object requests (with two storage nodes, one S3
> bucket and one desktop computer) and the cost exceeds storage, transfer,
> and ec2 costs.
>
> Here is some relevant code:
> https://github.com/LeastAuthority/tahoe-lafs/blob/cloud-
> rebased/src/allmydata/storage/backends/cloud/cloud_common.py#L426
>
> And relevant chat on IRC:
>
> <daira1> the list of shares is stored in a local database called the
> leasedb. that was added recently on the cloud branch, so I suspect we're
> not making optimal use of it yet
> <daira1> ISTR that zooko was arguing for treating the leasedb as
> authoritative as to whether a share exists, and I was arguing against for
> a reason that I can't remember right now. there's a ticket about it
> <zooko> Yes, the arguments about the trade-offs of treating leasedb as
> authoritative vs. advisory are encoded into tickets.
> <zooko> I seem to recall that treating leasedb as authoritative gets nice
> performance, including for this particular aspect, while trading off some
> other values.

New description:

 The cloud backend uses lots of expensive LIST requests with an Amazon S3
 bucket from heavy use of GET Bucket. The GET Bucket request is billed as a
 LIST request and is 10 times more expensive than a GET Object request.

 These LIST requests can be a large portion of the cost of using an S3
 backend storage node. For example, my logs show 1.5 times as many GET
 Bucket requests as GET Object requests (with two storage nodes, one S3
 bucket and one desktop computer) and the cost exceeds storage, transfer,
 and EC2 costs.

 Here is some relevant code:
 https://github.com/LeastAuthority/tahoe-lafs/blob/cloud-
 rebased/src/allmydata/storage/backends/cloud/cloud_common.py#L426

 And relevant chat on IRC:

 <daira1> the list of shares is stored in a local database called the
 leasedb. that was added recently on the cloud branch, so I suspect we're
 not making optimal use of it yet
 <daira1> ISTR that zooko was arguing for treating the leasedb as
 authoritative as to whether a share exists, and I was arguing against for
 a reason that I can't remember right now. there's a ticket about it
 <zooko> Yes, the arguments about the trade-offs of treating leasedb as
 authoritative vs. advisory are encoded into tickets.
 <zooko> I seem to recall that treating leasedb as authoritative gets nice
 performance, including for this particular aspect, while trading off some
 other values.

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2346#comment:1>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list