#1836 closed defect

use leasedb (not crawler) to figure out how many shares you have and how many bytes — at Version 10

Reported by: zooko Owned by: davidsarah
Priority: normal Milestone: 1.15.0
Component: code-storage Version: 1.9.2
Keywords: leases garbage-collection test-needed accounting Cc:
Launchpad Bug:

Description (last modified by zooko)

In current trunk, there is a "BucketCountingCrawler" whose job it is to count up how many shares are stored.

I propose that this be replaced by using the leasedb to count files (a simple SQL COUNT query!), and at the same time to extend the storage server's abilities by letting it be able to add up the aggregate sizes of things as well as their number.

This is part of an "overarching ticket" to eliminate most uses of crawler — ticket #1834.

Change History (10)

comment:1 Changed at 2012-10-30T23:12:45Z by zooko

  • Description modified (diff)

comment:2 Changed at 2012-10-30T23:14:24Z by zooko

The part about reporting total space usage would be very useful for customers of LeastAuthority.com (who pay per byte), among others.

Last edited at 2012-10-31T10:07:37Z by zooko (previous) (diff)

comment:3 Changed at 2012-10-31T00:09:16Z by davidsarah

  • Owner set to davidsarah
  • Status changed from new to assigned

+1.

comment:4 Changed at 2012-10-31T10:08:04Z by zooko

  • Summary changed from stop crawling share files in order to figure out how many shares you have to use leasedb (not crawler) to figure out how many shares you have and how many bytes

comment:5 Changed at 2012-11-09T06:51:58Z by zooko

Using leasedb this way would facilitate solving #671 — bring back sizelimit (i.e. max consumed, not min free).

comment:6 Changed at 2012-11-21T00:49:35Z by zooko

  • Description modified (diff)

comment:7 Changed at 2012-12-14T20:24:43Z by zooko

Using leasedb this way would facilitate solving #940.

comment:8 Changed at 2012-12-15T00:59:41Z by davidsarah

The most basic form of the 'total used space' query is

SELECT SUM(`used_space`) FROM `shares`

How much account-specific information should we add? At the moment, there are only two accounts -- anonymous and starter -- but that is already enough to introduce the complication that more than one account can hold a lease on the same share, so the query above is not equivalent to

SELECT SUM(`used_space`) FROM `shares` s JOIN `leases` l
       ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)

since that can count space for a share more than once.

comment:9 Changed at 2012-12-15T01:09:38Z by davidsarah

This query solves the above problem, giving the total number of leased shares and the total space used by leased shares:

SELECT COUNT(*), SUM(`used_space`)
  FROM (SELECT `used_space`
          FROM `shares` s JOIN `leases` l
          ON (s.`storage_index` = l.`storage_index` AND s.`shnum` = l.`shnum`)
          GROUP BY s.`storage_index`, s.`shnum`)

(Any WHERE clause can be added to the inner SELECT to pick leases that satisfy certain criteria.)

And this gives the number of shares and total used space leased by each account, sorted beginning with the one that is using most space:

SELECT `account_id`, COUNT(*), SUM(`used_space`)
  FROM `leases` l LEFT JOIN `shares` s
  ON (l.`storage_index` = s.`storage_index` AND l.`shnum` = s.`shnum`)
  GROUP BY `account_id` ORDER BY SUM(`used_space`) DESC

comment:10 Changed at 2013-07-04T16:23:57Z by zooko

  • Description modified (diff)

After talking with markberger today, I realized that #1818 is the ticket to merge leasedb into trunk, and #1819 is the superceding ticket to merge leasedb+cloud-backend into trunk.

Note: See TracTickets for help on using tickets.