#2016 closed defect (duplicate)

Not enough available servers are found

Reported by: kapiteined Owned by: daira
Priority: major Milestone: 1.10.1
Component: code-peerselection Version: 1.10.0
Keywords: servers-of-happiness upload error Cc:
Launchpad Bug:

Description (last modified by zooko)

When uploading a file, it fails with the following error:

<class 'allmydata.interfaces.UploadUnhappinessError'>: shares could be placed on only 4 server(s) such that any 3 of them have enough shares to recover the file, but we were asked to place shares on at least 5 such servers. (placed all 5 shares, want to place shares on at least 5 servers such that any 3 of them have enough shares to recover the file, sent 6 queries to 6 servers, 4 queries placed some shares, 2 placed none (of which 2 placed none due to the server being full and 0 placed none due to an error))

There are 12 servers connected to this grid (pubgrid) yet 6 queries are send, and because two are full the upload fails (if i interpreted the error right).

Shouldn't there be another round of queries if the first round does not yield enough available servers?

Change History (9)

comment:1 in reply to: ↑ description Changed at 2013-07-05T19:57:05Z by kapiteined

Replying to kapiteined:

When uploading a file, it fails with the following error:

<class 'allmydata.interfaces.UploadUnhappinessError?'>: shares could be placed on only 4 server(s) such that any 3 of them have enough shares to recover the file, but we were asked to place shares on at least 5 such servers. (placed all 5 shares, want to place shares on at least 5 servers such that any 3 of them have enough shares to recover the file, sent 6 queries to 6 servers, 4 queries placed some shares, 2 placed none (of which 2 placed none due to the server being full and 0 placed none due to an error))

There are 12 servers connected to this grid (pubgrid) yet 6 queries are send, and because two are full the upload fails (if i interpreted the error right).

Shouldn't there be another round of queries if the first round does not yield enough available servers?

somehow attaching a file to this ticket failed, so i put the error report ( incident-2013-07-05--19-34-13Z-7o6admq.flog.bz2 ) at URI:CHK:7tbpjhxokkmpere6nxwfa5cvey:37ypgfhpwg67veqpyhjve22edmh3w3jwpbds47yfnvjussvalmaq:3:5:74128 in the pubgrid.

comment:2 Changed at 2013-07-05T20:49:40Z by daira

Here's the most important part of the log:

local#675113 20:33:49.785: CHKUploader starting
local#675114 20:33:49.786: starting upload of <allmydata.immutable.upload.EncryptAnUploadable instance at 0x31a3378>
local#675115 20:33:49.786: creating Encoder <Encoder for unknown storage index>
local#675116 20:33:49.787: file size: 658086
local#675117 20:33:49.789: my encoding parameters: (3, 5, 5, 131073)
local#675118 20:33:49.790: got encoding parameters: 3/5/5 131073
local#675119 20:33:49.790: now setting up codec
local#675120 20:33:49.878: using storage index jbljj
local#675121 20:33:49.878: <Tahoe2ServerSelector for upload jbljj>(jbljj): starting
local#675122 20:33:49.927: <Tahoe2ServerSelector for upload jbljj>(jbljj): asking server psdgefgf for any existing shares
local#675123 20:33:49.954: <Tahoe2ServerSelector for upload jbljj>(jbljj): asking server 5sqtlw for any existing shares
local#675124 20:33:49.964: got result from [hrtib2], 0 shares
local#675125 20:33:49.965: but we're not running, so we'll ignore it
local#675126 20:33:49.966: _check_for_done, mode is 'MODE_READ', 2 queries outstanding, 2 extra servers available, 0 'must query' servers left, need_privkey=False
local#675127 20:33:49.967: but we're not running
local#675128 20:33:49.988: got result from [nszizg], 0 shares
local#675129 20:33:49.989: but we're not running, so we'll ignore it
local#675130 20:33:49.990: _check_for_done, mode is 'MODE_READ', 1 queries outstanding, 2 extra servers available, 0 'must query' servers left, need_privkey=False
local#675131 20:33:49.990: but we're not running
local#675132 20:33:50.083: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to get_buckets() from server psdgefgf: alreadygot=()
local#675133 20:33:50.112: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to get_buckets() from server 5sqtlw: alreadygot=()
local#675134 20:33:50.216: got result from [r7cddi], 0 shares
local#675135 20:33:50.217: but we're not running, so we'll ignore it
local#675136 20:33:50.218: _check_for_done, mode is 'MODE_READ', 0 queries outstanding, 2 extra servers available, 0 'must query' servers left, need_privkey=False
local#675137 20:33:50.219: but we're not running
local#675138 20:33:50.290: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to allocate_buckets() from server i76mi6: alreadygot=(0,), allocated=()
local#675139 20:33:50.457: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to allocate_buckets() from server lxmst5: alreadygot=(2,), allocated=(1,)
local#675140 20:33:50.667: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to allocate_buckets() from server sf7ehc: alreadygot=(3,), allocated=()
local#675141 20:33:50.822: <Tahoe2ServerSelector for upload jbljj>(jbljj): response to allocate_buckets() from server ddvfcd: alreadygot=(4,), allocated=()
local#675142 20:33:50.839: <Tahoe2ServerSelector for upload jbljj>(jbljj): server selection unsuccessful for <Tahoe2ServerSelector for upload jbljj>:
 shares could be placed on only 4 server(s) such that any 3 of them have enough shares to recover the file, but we were asked to place shares on at least 5 such servers.
 (placed all 5 shares, want to place shares on at least 5 servers such that any 3 of them have enough shares to recover the file, sent 6 queries to 6 servers, 4 queries placed some shares, 2 placed none (of which 2 placed none due to the server being full and 0 placed none due to an error)),
 merged=sh0: i76mi6en, sh1: lxmst5bx, sh2: lxmst5bx, sh3: sf7ehcpn, sh4: ddvfcdns

comment:3 follow-up: Changed at 2013-07-05T20:59:39Z by daira

Here's my interpretation: with h = N = 5, as soon as the Tahoe2ServerSelector decides to put two shares on the same server (here sh2 and sh3 on lxmst5bx), the upload is doomed. The shares all have to be on different servers whenever h = N, but the termination condition is just that all shares have been placed, not that they have been placed in a way that meets the happiness condition.

If that's the problem, then #1382 should fix it. This would also explain why VG2 was unreliable with h close to N.

Version 0, edited at 2013-07-05T20:59:39Z by daira (next)

comment:4 in reply to: ↑ 3 Changed at 2013-07-05T21:03:15Z by zooko

Daira: excellent work diagnosing this!! Ed: thanks so much for the bug report. Daira: it looks like you are right, and I think this does explain those bugs that the volunteergrid2 people reported and that I never understood. Thank you!

comment:5 Changed at 2013-07-05T21:05:59Z by zooko

  • Description modified (diff)

comment:6 Changed at 2013-07-05T21:08:50Z by kapiteined

And to check if that is the case, i changed to 3-7-10 encoding, and now the upload succeeds! Success: file copied

Does this call for a change in code, or for a big warning sticker: "don't choose h and n to close together" ?

comment:7 Changed at 2013-07-07T19:40:32Z by daira

We intend to fix it for v1.11 (Mark Berger's branch for #1382 already basically works), but there would be no harm in pointing out this problem on tahoe-dev in the meantime.

comment:8 follow-up: Changed at 2013-07-09T14:33:42Z by daira

  • Component changed from unknown to code-peerselection
  • Keywords servers-of-happiness upload error added
  • Milestone changed from undecided to 1.11.0
  • Priority changed from normal to major

Same bug as #1791?

comment:9 in reply to: ↑ 8 Changed at 2013-07-09T14:38:12Z by daira

  • Resolution set to duplicate
  • Status changed from new to closed

Replying to daira:

Same bug as #1791?

Yes, that bug also had h = N and two shares that were placed on the same server, so almost identical. I'll copy the conclusions here to that ticket.

Note: See TracTickets for help on using tickets.