don't place shares on servers that already have shares #2107

New issue

Open

opened 2013-11-14 22:50:04 +00:00 by zooko · 32 comments

zooko commented

2013-11-14 22:50:04 +00:00

My main reason for changing my mind about this is that Diego "sickness" Righi is confused and dismayed by this behavior (e.g. #2106), and if a feature confuses and dismays users, then it is probably not worth it. Consider also Kevan Carstensen's master's thesis on Upload Strategy Of Happiness, which says:

This means we are not allowed to store any erasure coded share more than once.

The reasoning for this requirement is less straightforward than that for spreading shares out amongst many storage servers. Note that we can achieve an availability improvement by storing more data. If r = 4, and we store five replicas on five distinct servers, then the file can tolerate the loss of four servers instead of five before read availability is compromised. Using selective double placement of shares in an Fe -style filesystem allows us to tolerate the failure of n − k + 1 or more storage servers.

This requirement is more for usability and consistency than any clear availability criteria. Space utilization in distributed filesystems is an important issue. Many commodity computing services charge based on the amount of space used. So, in a practical distributed system, it is important for the user to be able to reason about space usage in a precise way. Explicit erasure-coding or replication parameters provided to the user allow the user to do this. We argue that it is not appropriate for an algorithm to second-guess the user’s choices, and say instead that the user will increase n, k, or r if they want more data stored on the filesystem.

That's a pretty good argument! Especially the part about "You're paying for that space, and if you upload 2 or 3 shares to one server, you're paying 2 or 3 times as much, but not getting much fault-tolerance in return for it.". See also this post on tahoe-dev.

I would add that if a user is diagnosing the state of their grid, or reasoning about possible future behavior of their grid, it will be more intuitive and easier for them (at least for many people) to think about if they can assume that shares will never be intentionally doubled-up.

I think we should change it so that the Upload Strategy doesn't bother uploading a share to a server if that server already has a share of that file. I used to think it was worth uploading "extra" shares to a server that already had a share, because there are some (probably rare) cases where it can help with file recovery, and it never *hurts* file recovery, but I've changed my mind. My main reason for changing my mind about this is that Diego "sickness" Righi is confused and dismayed by this behavior (e.g. #2106), and if a feature confuses and dismays users, then it is probably not worth it. Consider also [Kevan Carstensen's master's thesis on Upload Strategy Of Happiness](https://zooko.com/uri/URI%3ADIR2-CHK%3Aooyppj6eshxwmweeelqm3x54nq%3Au5pauln65blikfn5peq7e4s7x5fwdvvhvsklmfmwbjxlvlosldcq%3A1%3A1%3A105588/Carstensen-2011-Robust_Resource_Allocation_In_Distributed_Filesystem.pdf), which says: This means we are not allowed to store any erasure coded share more than once. The reasoning for this requirement is less straightforward than that for spreading shares out amongst many storage servers. Note that we can achieve an availability improvement by storing more data. If r = 4, and we store five replicas on five distinct servers, then the file can tolerate the loss of four servers instead of five before read availability is compromised. Using selective double placement of shares in an Fe -style filesystem allows us to tolerate the failure of n − k + 1 or more storage servers. This requirement is more for usability and consistency than any clear availability criteria. Space utilization in distributed filesystems is an important issue. Many commodity computing services charge based on the amount of space used. So, in a practical distributed system, it is important for the user to be able to reason about space usage in a precise way. Explicit erasure-coding or replication parameters provided to the user allow the user to do this. We argue that it is not appropriate for an algorithm to second-guess the user’s choices, and say instead that the user will increase n, k, or r if they want more data stored on the filesystem. That's a pretty good argument! Especially the part about "You're paying for that space, and if you upload 2 or 3 shares to one server, you're paying 2 or 3 times as much, but not getting much fault-tolerance in return for it.". See also [this post](https://tahoe-lafs.org/pipermail/tahoe-dev/2013-September/008694.html) on tahoe-dev. I would add that if a user is diagnosing the state of their grid, or reasoning about possible future behavior of their grid, it will be more intuitive and easier for them (at least for many people) to think about if they can assume that shares will never be intentionally doubled-up.

zooko added the

labels 2013-11-14 22:50:04 +00:00

zooko added this to the undecided milestone 2013-11-14 22:50:04 +00:00

zooko commented

2013-11-14 23:04:30 +00:00

Author

I guess another reason to make this change is that Kevan's master's thesis assumes it (see the quote in the ticket description), so making this change would make our implementation match his analysis more precisely.

daira commented

2013-11-14 23:39:39 +00:00

I think you're misinterpreting the thesis. It says not to put a given share on the same server if it already exists there; not that upload or repair can't create shares that are duplicates of existing shares but on different servers that already have (other) shares. The latter is clearly necessary to achieve happiness. (Proof: suppose that one server has all shnums, the remaining servers have shnum 0, and shares.happy > 2.)

I think you're misinterpreting the thesis. It says not to put a given share on the *same* server if it already exists there; not that upload or repair can't create shares that are duplicates of existing shares but on *different* servers that already have (other) shares. The latter is clearly necessary to achieve happiness. (Proof: suppose that one server has all shnums, the remaining servers have shnum 0, and shares.happy > 2.)

daira commented

2013-11-14 23:42:21 +00:00

To be clear, I think you're misinterpreting the word "store" in the first sentence of the thesis quote. It means "place on a given server", not "end up with on a given server".

zooko commented

2013-11-14 23:44:56 +00:00

Author

Replying to daira:

I think you're misinterpreting the thesis. It says not to put a given share on the same server if it already exists there; not that upload or repair can't create shares that are duplicates of existing shares but on different servers that already have (other) shares. The latter is clearly necessary to achieve happiness. (Proof: suppose that one server has all shnums, the remaining servers have shnum 0, and shares.happy > 2.)

Good point. Here are some possible rules:

• [rule]bad: Don't put a share on a server if that server already has a different share. Sometimes, as Daira points out, we would have to break that rule in order to achieve Servers-of-Happiness.

• [rule]bad: Don't put a share on a server if that share is already on another server. The same example Daira gave makes it so we would have to break this rule in order to achieve Servers-of-Happiness.

• [rule?]good: Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level.

• [rule?]good: Don't put a share on a server unless doing so is a necessary placement in an upload placement that would reach the required Servers-of-Happiness-Level H.

Replying to [daira](/tahoe-lafs/trac/issues/2107#issuecomment-395662): > I think you're misinterpreting the thesis. It says not to put a given share on the *same* server if it already exists there; not that upload or repair can't create shares that are duplicates of existing shares but on *different* servers that already have (other) shares. The latter is clearly necessary to achieve happiness. (Proof: suppose that one server has all shnums, the remaining servers have shnum 0, and shares.happy > 2.) Good point. Here are some possible rules: • [rule]bad: Don't put a share on a server if that server already has a different share. Sometimes, as Daira points out, we would have to break that rule in order to achieve Servers-of-Happiness. • [rule]bad: Don't put a share on a server if that share is already on another server. The same example Daira gave makes it so we would have to break this rule in order to achieve Servers-of-Happiness. • [rule?]good: Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level. • [rule?]good: Don't put a share on a server unless doing so is a necessary placement in an upload placement that would reach the required Servers-of-Happiness-Level `H`.

daira commented

2013-11-22 15:08:35 +00:00

Replying to zooko:

[...] Here are some possible rules:

[numbering for ease of reference]added

[rule]bad: Don't put a share on a server if that server already has a different share. Sometimes, as Daira points out, we would have to break that rule in order to achieve Servers-of-Happiness.

[rule]bad: Don't put a share on a server if that share is already on another server. The same example Daira gave makes it so we would have to break this rule in order to achieve Servers-of-Happiness.

[rule?]good: Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level.

[rule?]good: Don't put a share on a server unless doing so is a necessary placement in an upload placement that would reach the required Servers-of-Happiness-Level H.

I'm confused as to why we're trying to redesign this now. Doesn't the algorithm based on Kevan's thesis that Mark has already implemented (modulo some minor details) already satisfy 3?

Rule 4 seems ambiguous if "necessary" means "necessary to achieve happiness". The problem is that there can be multiple ways to achieve happiness using different subsets of a given share placement, and none of these may be "necessary" given that the others exist.

Replying to [zooko](/tahoe-lafs/trac/issues/2107#issuecomment-395664): > [...] Here are some possible rules: [numbering for ease of reference]added > 1. [rule]bad: Don't put a share on a server if that server already has a different share. Sometimes, as Daira points out, we would have to break that rule in order to achieve Servers-of-Happiness. > > 2. [rule]bad: Don't put a share on a server if that share is already on another server. The same example Daira gave makes it so we would have to break this rule in order to achieve Servers-of-Happiness. > > 3. [rule?]good: Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level. > > 4. [rule?]good: Don't put a share on a server unless doing so is a necessary placement in an upload placement that would reach the required Servers-of-Happiness-Level `H`. I'm confused as to why we're trying to redesign this now. Doesn't the algorithm based on Kevan's thesis that Mark has already implemented (modulo some minor details) already satisfy 3? Rule 4 seems ambiguous if "necessary" means "necessary to achieve happiness". The problem is that there can be multiple ways to achieve happiness using different subsets of a given share placement, and none of these may be "necessary" given that the others exist.

zooko commented

2013-12-01 02:08:31 +00:00

Author

Okay, so can we all agree on Rule 3?

Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."

Well, no, that doesn't make sense either. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share.

So is there a rule which satisfies sickness's (and now my) intuition that it is confusing and usually-not-useful to add shares to a server that already has shares?

Okay, so can we all agree on Rule 3? Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level." Well, no, that doesn't make sense *either*. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share. So *is* there a rule which satisfies sickness's (and now my) intuition that it is confusing and usually-not-useful to add shares to a server that already has shares?

daira commented

2013-12-01 12:50:40 +00:00

Replying to [zooko]comment:7:

Well, no, that doesn't make sense either. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share.

No, it will be 1 after placing the share.

It was intentional that the definition of happiness distinguishes between happiness levels less than K, even though none of them are sufficient for the file to be recoverable (this was an advantage of the maximum matching formulation over the "number of servers, the survival and correct function of which will guarantee that your file is available" definition that preceded it; see /tahoe-lafs/trac/issues/26722#comment:162).

Replying to [zooko]comment:7: > Well, no, that doesn't make sense *either*. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share. No, it will be 1 after placing the share. It was intentional that the definition of happiness distinguishes between happiness levels less than K, even though none of them are sufficient for the file to be recoverable (this was an advantage of the maximum matching formulation over the "number of servers, the survival and correct function of which will guarantee that your file is available" definition that preceded it; see [/tahoe-lafs/trac/issues/26722](/tahoe-lafs/trac/issues/26722)#comment:162).

zooko commented

2013-12-01 16:23:48 +00:00

Author

Replying to [daira]comment:8:

Replying to [zooko]comment:7:

Well, no, that doesn't make sense either. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share.

No, it will be 1 after placing the share.

It was intentional that the definition of happiness distinguishes between happiness levels less than K, even though none of them are sufficient for the file to be recoverable (this was an advantage of the maximum matching formulation over the "number of servers, the survival and correct function of which will guarantee that your file is available" definition that preceded it; see /tahoe-lafs/trac/issues/26722#comment:162).

Wait, what? I thought "Happiness" was defined as "The size of the largest set of servers such that any K-sized subset of it can recover your file". That number is 0 before and after uploading 1 share of a K=3 file. What is the definition of "Happiness"?

Replying to [daira]comment:8: > Replying to [zooko]comment:7: > > Well, no, that doesn't make sense *either*. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share. > > No, it will be 1 after placing the share. > > It was intentional that the definition of happiness distinguishes between happiness levels less than K, even though none of them are sufficient for the file to be recoverable (this was an advantage of the maximum matching formulation over the "number of servers, the survival and correct function of which will guarantee that your file is available" definition that preceded it; see [/tahoe-lafs/trac/issues/26722](/tahoe-lafs/trac/issues/26722)#comment:162). Wait, what? I thought "Happiness" was defined as "The size of the largest set of servers such that any K-sized subset of it can recover your file". That number is 0 before and after uploading 1 share of a K=3 file. What *is* the definition of "Happiness"?

markberger commented

2013-12-01 19:36:19 +00:00

In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1.

However, we do use a boolean value to indicate whether a file is healthy. We should measure this value by h_actual >= h_min where h_actual is the happiness value of the share distribution and h_min is the minimum happiness level specified by the user. However, this hasn't been implemented yet (see ticket #614).

h_max is defined as min(len(shares), len(servers)) where shares and servers are sets. This is because we will always be limited by the smallest set in a maximum matching bipartite graph.

Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers. In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1. However, we do use a boolean value to indicate whether a file is healthy. We should measure this value by h_actual >= h_min where h_actual is the happiness value of the share distribution and h_min is the minimum happiness level specified by the user. However, this hasn't been implemented yet (see ticket #614). h_max is defined as min(len(shares), len(servers)) where shares and servers are sets. This is because we will always be limited by the smallest set in a maximum matching bipartite graph.

amontero commented

2013-12-01 19:55:38 +00:00

Owner

As per (@@https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2123#comment:395664@@) I've done a writeup at #2124 with a feature proposal that might be of interest here.

As per (@@https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2123#[comment:395664](/tahoe-lafs/trac/issues/2107#issuecomment-395664)@@) I've done a writeup at #2124 with a feature proposal that might be of interest here.

zooko commented

2013-12-01 21:26:00 +00:00

Author

Thanks, markberger!

Replying to markberger:

Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers.

In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1.

Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if some combination of the servers already holding shares wouldn't be able to provide the file in just as many cases without the addition of this new share?

Okay, so I think what this means is that sickness and amontero and me would be satisfied with Rule 3:

Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."

Does everyone agree that this is a good rule to enforce on an uploader or repairer? markberger? daira? sickness? amontero? brian?

Thanks, markberger! Replying to [markberger](/tahoe-lafs/trac/issues/2107#issuecomment-395669): > Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers. > > In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1. Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if *some* combination of the servers already holding shares wouldn't be able to provide the file in just as many cases *without* the addition of this new share? Okay, so I *think* what this means is that sickness and amontero and me would be satisfied with Rule 3: *Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."* Does everyone agree that this is a good rule to enforce on an uploader or repairer? markberger? daira? sickness? amontero? brian?

sickness commented

2013-12-01 22:13:43 +00:00

Owner

rule 3 is ok to me too! :)

P.S.: that said I would like to stress my previous explanation ticket https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106 with another even more simple example:

let's assume we have a 2 servers "RAIC" setup which mimics the classic 2 local disks RAID1 setup

we want tahoe-lafs to write 1 share of any file on both servers so in case one of them dies, the other holds all the necessary shares to recover the file, something like 1:2:2

now what if a server dies, or is unreachable, and we write BOTH shares on the only one remaining server?

current uploader would be happy, and when the broken/gone server will be eventually back, current repairer wouldn't create a share on it because the other server already has 2 shares...

...ok now what if the server which holds both shares dies? :/

rule 3 is ok to me too! :) P.S.: that said I would like to stress my previous explanation ticket <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106> with another even more simple example: let's assume we have a 2 servers "RAIC" setup which mimics the classic 2 local disks RAID1 setup we want tahoe-lafs to write 1 share of any file on both servers so in case one of them dies, the other holds all the necessary shares to recover the file, something like 1:2:2 now what if a server dies, or is unreachable, and we write BOTH shares on the only one remaining server? current uploader would be happy, and when the broken/gone server will be eventually back, current repairer wouldn't create a share on it because the other server already has 2 shares... ...ok now what if the server which holds both shares dies? :/

zooko commented

2013-12-01 22:15:29 +00:00

Author

Replying to sickness:

rule 3 is ok to me too! :)

P.S.: that said I would like to stress my previous explanation ticket https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106 with another even more simple example:

let's assume we have a 2 servers "RAIC" setup which mimics the classic 2 local disks RAID1 setup

we want tahoe-lafs to write 1 share of any file on both servers so in case one of them dies, the other holds all the necessary shares to recover the file, something like 1:2:2

now what if a server dies, or is unreachable, and we write BOTH shares on the only one remaining server?

Right. Good, simple example. Rule 3 would prevent the uploader or repairer from putting both shares on one server.

Replying to [sickness](/tahoe-lafs/trac/issues/2107#issuecomment-395672): > rule 3 is ok to me too! :) > P.S.: that said I would like to stress my previous explanation ticket <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106> with another even more simple example: > let's assume we have a 2 servers "RAIC" setup which mimics the classic 2 local disks RAID1 setup > we want tahoe-lafs to write 1 share of any file on both servers so in case one of them dies, the other holds all the necessary shares to recover the file, something like 1:2:2 > now what if a server dies, or is unreachable, and we write BOTH shares on the only one remaining server? Right. Good, simple example. Rule 3 would prevent the uploader or repairer from putting both shares on one server.

markberger commented

2013-12-01 22:26:28 +00:00

Replying to [zooko]comment:13:

Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if some combination of the servers already holding shares wouldn't be able to provide the file in just as many cases without the addition of this new share?

Yes, due to the nature of a maximum matching bipartite graph, the servers of happiness value will only increase for shares that have not been placed somewhere else. And yes, when a share is placed on a server that already contains some nonzero number of shares, the servers of happiness value does not increase.

Replying to [zooko]comment:13: > > Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if *some* combination of the servers already holding shares wouldn't be able to provide the file in just as many cases *without* the addition of this new share? Yes, due to the nature of a maximum matching bipartite graph, the servers of happiness value will only increase for shares that have not been placed somewhere else. And yes, when a share is placed on a server that already contains some nonzero number of shares, the servers of happiness value does not increase.

amontero commented

2013-12-02 11:33:04 +00:00

Owner

Replying to [zooko]comment:13:

Thanks, markberger!

Replying to markberger:

Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers.

In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1.

Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if some combination of the servers already holding shares wouldn't be able to provide the file in just as many cases without the addition of this new share?

Okay, so I think what this means is that sickness and amontero and me would be satisfied with Rule 3:

Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."

Does everyone agree that this is a good rule to enforce on an uploader or repairer? markberger? daira? sickness? amontero? brian?

Looks like to me it would fulfill the needs of my scenario of "not wasting space in any storage node (waste=shares >k)" (#2123).

Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node)

In other words: would this place shares above N's limit on new nodes not having none?

Replying to [zooko]comment:13: > Thanks, markberger! > > Replying to [markberger](/tahoe-lafs/trac/issues/2107#issuecomment-395669): > > Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers. > > > > In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1. > > Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if *some* combination of the servers already holding shares wouldn't be able to provide the file in just as many cases *without* the addition of this new share? > > > Okay, so I *think* what this means is that sickness and amontero and me would be satisfied with Rule 3: > > *Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."* > > Does everyone agree that this is a good rule to enforce on an uploader or repairer? markberger? daira? sickness? amontero? brian? Looks like to me it would fulfill the needs of my scenario of "not wasting space in any storage node (waste=shares >k)" (#2123). Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node) In other words: would this place shares above N's limit on new nodes not having none?

zooko commented

2013-12-02 16:25:37 +00:00

Author

Replying to [amontero]comment:17:

Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node)

In other words: would this place shares above N's limit on new nodes not having none?

This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. If you attach a third server, then it would attempt to upload a 3rd share to that 3rd server.

Replying to [amontero]comment:17: > > Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node) > > In other words: would this place shares above N's limit on new nodes not having none? This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. If you attach a third server, then it would attempt to upload a 3rd share to that 3rd server.

daira commented

2013-12-02 23:53:41 +00:00

Replying to [zooko]comment:18:

Replying to [amontero]comment:17:

Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node)

In other words: would this place shares above N's limit on new nodes not having none?

This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded.

Actually it will also put the 3rd share on one of the two servers.

If you attach a third server, then it would attempt to upload a 3rd share to that 3rd server.

In that case repair would put another copy of the 3rd share on the 3rd server, yes. When #1816 is fixed, only 3 shares would have their leases renewed (the redundant copy of the 3rd share on one of the first two servers would not).

Replying to [zooko]comment:18: > Replying to [amontero]comment:17: > > > > Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node) > > > > In other words: would this place shares above N's limit on new nodes not having none? > > This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. Actually it will also put the 3rd share on one of the two servers. > If you attach a third server, then it would attempt to upload a 3rd share to that 3rd server. In that case repair would put another copy of the 3rd share on the 3rd server, yes. When #1816 is fixed, only 3 shares would have their leases renewed (the redundant copy of the 3rd share on one of the first two servers would not).

zooko commented

2013-12-02 23:56:14 +00:00

Author

Replying to [daira]comment:19:

This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded.

Actually it will also put the 3rd share on one of the two servers.

Not if Rule 3 from this ticket (#2107) is implemented in the uploader!

Replying to [daira]comment:19: > > This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. > > Actually it will also put the 3rd share on one of the two servers. Not if Rule 3 from this ticket (#2107) is implemented in the uploader!

daira commented

2013-12-03 01:38:10 +00:00

Replying to [zooko]comment:20:

Replying to [daira]comment:19:

This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded.

Actually it will also put the 3rd share on one of the two servers.

Not if Rule 3 from this ticket (#2107) is implemented in the uploader!

Huh. Well then I don't agree with Rule 3. I thought we'd already agreed that once all servers have at least one share, any excess shares up to shares.total should also be distributed fairly evenly between the available servers?

Replying to [zooko]comment:20: > Replying to [daira]comment:19: > > > This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. > > > > Actually it will also put the 3rd share on one of the two servers. > > Not if Rule 3 from this ticket (#2107) is implemented in the uploader! Huh. Well then I don't agree with Rule 3. I thought we'd already agreed that once all servers have at least one share, any excess shares up to `shares.total` should also be distributed fairly evenly between the available servers?

zooko commented

2013-12-03 01:40:10 +00:00

Author

Replying to [daira]comment:21:

Huh. Well then I don't agree with Rule 3. I thought we'd already agreed that once all servers have at least one share, any excess shares up to shares.total should also be distributed fairly evenly between the available servers?

No! We have certainly not all agreed to that. That is the opposite of what sickness and amontero, and to a lesser extend I myself want, which is the topic of this ticket. They want, if I understand correctly, that once all servers have at least one share, then any excess shares do not get uploaded to any servers.

Replying to [daira]comment:21: > > Huh. Well then I don't agree with Rule 3. I thought we'd already agreed that once all servers have at least one share, any excess shares up to `shares.total` should also be distributed fairly evenly between the available servers? No! We have certainly *not* all agreed to that. That is the *opposite* of what sickness and amontero, and to a lesser extend I myself want, which is the topic of this ticket. They want, if I understand correctly, that once all servers have at least one share, then any excess shares do *not* get uploaded to any servers.

gdt commented

2013-12-03 01:53:11 +00:00

Owner

(I've been too busy to pay attention, so apologies if me throwing out a random comment is not helpful.)

It seems clear that "servers of happiness" is an approximation to a richer property that cannot be described so simply. So a rule written in terms of SoH is going to be an approximation to the correct behavior.

The real question is maximizing the ability to retrieve the file, given some assumed probability distribution of a) a server losing a share, but staying up and b) a server going away, traded off against storage cost somehow. In particular, I'm not sure how often a) happens, and whether we want to support the notion of assigning different probabilities to different servers, or correlated probabilities. So two placements with the same SoH can still have different probabilities, and the higher one is still preferred.

A key issue is when the number of servers is less than the number of shares, vs when it's more. I suspect that behaviors are quite different then. We've been talking about that, but perhaps that great divide should be more front and center in the discussion. It's also not reasonable to expect people to tweak encoding as the number of servers changes. It should be sane to run 3/10 all the time, and just place more shares if there are 4 servers.

It seems obvious to me that better balance is better. But what about S=4, 3/10 encoding. Is 1,3,3,3 really worse than 2,3,2,3? Or is it better, because if 3 out of 4 are lost, there are 3 ways to win and one to lose, vs 2 and 2 with 2,3,2,3? So it seems that regardless of probabilities, 1,3,3,3 is more robust. I'm not sure how to generalize this, except that for k=3 one should not put more than k until all servers have k.

(I've been too busy to pay attention, so apologies if me throwing out a random comment is not helpful.) It seems clear that "servers of happiness" is an approximation to a richer property that cannot be described so simply. So a rule written in terms of SoH is going to be an approximation to the correct behavior. The real question is maximizing the ability to retrieve the file, given some assumed probability distribution of a) a server losing a share, but staying up and b) a server going away, traded off against storage cost somehow. In particular, I'm not sure how often a) happens, and whether we want to support the notion of assigning different probabilities to different servers, or correlated probabilities. So two placements with the same SoH can still have different probabilities, and the higher one is still preferred. A key issue is when the number of servers is less than the number of shares, vs when it's more. I suspect that behaviors are quite different then. We've been talking about that, but perhaps that great divide should be more front and center in the discussion. It's also not reasonable to expect people to tweak encoding as the number of servers changes. It should be sane to run 3/10 all the time, and just place more shares if there are 4 servers. It seems obvious to me that better balance is better. But what about S=4, 3/10 encoding. Is 1,3,3,3 really worse than 2,3,2,3? Or is it better, because if 3 out of 4 are lost, there are 3 ways to win and one to lose, vs 2 and 2 with 2,3,2,3? So it seems that regardless of probabilities, 1,3,3,3 is more robust. I'm not sure how to generalize this, except that for k=3 one should not put more than k until all servers have k.

gdt commented

2013-12-03 01:54:47 +00:00

Owner

@zooko: So for 3/10, 4 servers, do you really want to stop with 1 share on each server? That seems gratuitously risky.
(I certainly agree that with 3/10 and S>10 servers, one wants 1 share on each of 10 servers.)

@zooko: So for 3/10, 4 servers, do you really want to stop with 1 share on each server? That seems gratuitously risky. (I certainly agree that with 3/10 and S>10 servers, one wants 1 share on each of 10 servers.)

PRabahy commented

2013-12-03 02:13:12 +00:00

Owner

When I first started using the public test grid, I set several files to intentionally have more Shares Needed and Total Shares than there were total nodes in the grid. I figured that as the grid grew, I would be able to re-balance these shares to gain better redundancy.

My vote (if I get one) is to leave this as it currently is. My first reaction when learning about erasure-coding was that Shares Needed and Total Shares were set in stone. I have always felt that !Share/Servers of Happiness was more of just a suggestion. When I first started using the public test grid, I set several files to intentionally have more Shares Needed and Total Shares than there were total nodes in the grid. I figured that as the grid grew, I would be able to re-balance these shares to gain better redundancy.

zooko commented

2013-12-03 02:27:09 +00:00

Author

Replying to PRabahy:

My vote (if I get one) is to leave this as it currently is.

Wait, I'm sorry but I don't know what that means. The whole topic is dizzyingly confusing to me. It's amazing to me that after, what 14 years of working with wide-area decentralized erasure-coding, I can't even formulate a rigorous statement of what I want.

PRabahy: what do you think of "Rule 3"?

*Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."*

Where "Servers-of-Happiness-Level" is defined as:

*Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.*

(I think. Did I get that definition right??)

gdt: what do you think of Rule 3?

My guess is that none of the following people even understand what Servers of Happiness or Rule 3 mean: PRabahy, sickness, amontero, gdt, zooko, warner. That leaves daira, markberger, and Kevin Carstensen as the only people on the planet who understand what Servers of Happiness even means.

This is distressing to me, because of the notion that a thing isn't good enough just by providing good behavior, but in addition to that it also has to be understandable and predictable behavior to users. I'm currently thinking that no decentralized erasure-coding system can do that. Perhaps The only solution is to strip erasure-coding out of a future version of LAFS and make K always = 1 (i.e. straight replication). ;-)

Replying to [PRabahy](/tahoe-lafs/trac/issues/2107#issuecomment-395683): > My vote (if I get one) is to leave this as it currently is. Wait, I'm sorry but I don't know what that means. The whole topic is dizzyingly confusing to me. It's *amazing* to me that after, what 14 years of working with wide-area decentralized erasure-coding, I can't even formulate a rigorous statement of *what I want*. PRabahy: what do you think of "Rule 3"? *Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."* Where "Servers-of-Happiness-Level" is defined as: *Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.* (I think. Did I get that definition right??) gdt: what do you think of Rule 3? My guess is that none of the following people even *understand* what Servers of Happiness or Rule 3 mean: PRabahy, sickness, amontero, gdt, zooko, warner. That leaves daira, markberger, and Kevin Carstensen as the only people on the planet who understand what Servers of Happiness even means. This is distressing to me, because of the notion that a thing isn't good enough just by providing good behavior, but in addition to that it also has to be *understandable* and *predictable* behavior to users. I'm currently thinking that no decentralized erasure-coding system can do that. Perhaps The only solution is to strip erasure-coding out of a future version of LAFS and make K always = 1 (i.e. straight replication). ;-)

gdt commented

2013-12-03 02:42:34 +00:00

Owner

For S >> N, I see why s-o-h makes sense; it essentially measures the number of servers that have a share, but not quite. But I don't see how s-o-h strictly relates to probability of success (even assuming equal probability of node failure). And probability of success is what we should care about, not some intermediate metric that is hard to understand how it relates to what we care about.

So no, I did not really follow the s-o-h definition. I don't see why a k-sized subset is the right metric, if K is the shares-needed (e.g. 3 in 3-of-10 coding). If I have 4 servers and 3/10 encoding, then I certainly want any 3 servers to recover, but I want more than that, because losing 2 of 4 is not so unthinkable. What I want is to minimize the probability that the file will be unrecoverable, given some arguably reasonable assumptions, and aubject to not trying to store any particular share on more than one server, on the theory that such duplication is wasteful and should be done by changing the coding instead. For S >> N, I see why s-o-h makes sense; it essentially measures the number of servers that have a share, but not quite. But I don't see how s-o-h strictly relates to probability of success (even assuming equal probability of node failure). And probability of success is what we should care about, not some intermediate metric that is hard to understand how it relates to what we care about.

daira commented

2013-12-03 02:47:54 +00:00

Here's why I'm opposed to implementing this ticket now: The behaviour of spreading out shares between available servers in a round-robin fashion is very long-standing. That's the behaviour expected and possibly relied on by people who have chosen their encoding parameters with the current placement algorithm in mind. #1382 was supposed to make the minimal change to the placement algorithm necessary to achieve happiness in cases where #778 caused regressions. If we want to do something else, let's decide on that after getting feedback from release 1.11.

Here's why I'm opposed to implementing this ticket now: The behaviour of spreading out shares between available servers in a round-robin fashion is very long-standing. That's the behaviour expected and possibly relied on by people who have chosen their encoding parameters with the current placement algorithm in mind. #1382 was supposed to make the *minimal* change to the placement algorithm necessary to achieve happiness in cases where #778 caused regressions. If we want to do something else, let's decide on that after getting feedback from release 1.11.

PRabahy commented

2013-12-03 03:04:58 +00:00

Owner

Replying to [zooko]comment:26:

PRabahy: what do you think of "Rule 3"?
*Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."*
Where "Servers-of-Happiness-Level" is defined as:
*Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.*
(I think. Did I get that definition right??)
I agree with your definition of "Servers-of-Happiness-Level", but do not believe that rule 3 is the most intuitive and robust rule. If Tahoe is able to place a share in a way that reduces the number of nodes needed to recover a file, that should be done. That said, Tahoe should never intentionally store the same share twice (or we could end up at the degenerate case of all nodes have K shares).

(Most of the following echos what gdt said.)
For example, I have a file encoded 2of5 and a grid with 2 nodes. Once each node has a share, the file is recoverable and the "Servers-of-Happiness-Level" cannot increase any further. I believe that each node should receive 2 shares distinct shares and the 5th should be discarded. This will allow the file to be recovered from either node but not waste any space on the grid. If a 3rd node comes online and a repair is preformed, the 5th share should be regenerated and placed on the 3rd node (both original nodes keep their same shares). If a 4th node comes online and a repair is preformed, one of the original nodes should transfer a share to the new node.

Hopefully this is clear. I know how confusing matching intuition with formal definitions can be.

Replying to [zooko]comment:26: > > PRabahy: what do you think of "Rule 3"? > > *Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."* > > Where "Servers-of-Happiness-Level" is defined as: > > *Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.* > > (I think. Did I get that definition right??) I agree with your definition of "Servers-of-Happiness-Level", but do not believe that rule 3 is the most intuitive and robust rule. If Tahoe is able to place a share in a way that reduces the number of nodes needed to recover a file, that should be done. That said, Tahoe should never intentionally store the same share twice (or we could end up at the degenerate case of all nodes have K shares). (Most of the following echos what [gdt](/tahoe-lafs/trac/issues/2107#issuecomment-395685) said.) For example, I have a file encoded 2of5 and a grid with 2 nodes. Once each node has a share, the file is recoverable and the "Servers-of-Happiness-Level" cannot increase any further. I believe that each node should receive 2 shares distinct shares and the 5th should be discarded. This will allow the file to be recovered from either node but not waste any space on the grid. If a 3rd node comes online and a repair is preformed, the 5th share should be regenerated and placed on the 3rd node (both original nodes keep their same shares). If a 4th node comes online and a repair is preformed, one of the original nodes should transfer a share to the new node. Hopefully this is clear. I know how confusing matching intuition with formal definitions can be.

daira commented

2013-12-03 14:20:37 +00:00

Replying to [zooko]comment:26:

Where "Servers-of-Happiness-Level" is defined as:
*Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.*
(I think. Did I get that definition right??)

Not exactly. Suppose that there is no K-sized subset of servers that can recover the file; then there is no such largest set, and so this wording would not give a well-defined result.

The correct definition is:

Servers-of-Happiness-Level: the size of a maximum matching in the bipartite graph relating servers to shares that they hold.

or equivalently:

Servers-of-Happiness-Level: the maximum number of links that can be drawn from servers to shares that they hold, such that no server and no share appears more than once.

The original definition could instead be repaired by setting the result to 0 when there is no K-sized subset that can recover the file, but that would be less useful. The difference doesn't matter (since the definitions coincide) for deciding when an upload or repair meets a happiness threshold shares.happy >= K, but it does matter for defining intermediate steps of a placement algorithm.

Replying to [zooko]comment:26: > Where "Servers-of-Happiness-Level" is defined as: > > *Servers-of-Happiness-Level: the size of the largest set of servers such that **any** K-sized subset of it can recover the file.* > > (I think. Did I get that definition right??) Not exactly. Suppose that there is no K-sized subset of servers that can recover the file; then there is no such largest set, and so this wording would not give a well-defined result. The correct definition is: *Servers-of-Happiness-Level: the size of a maximum matching in the bipartite graph relating servers to shares that they hold.* or equivalently: *Servers-of-Happiness-Level: the maximum number of links that can be drawn from servers to shares that they hold, such that no server and no share appears more than once.* The original definition could instead be repaired by setting the result to 0 when there is no K-sized subset that can recover the file, but that would be less useful. The difference doesn't matter (since the definitions coincide) for deciding when an upload or repair meets a happiness threshold `shares.happy` >= K, but it does matter for defining intermediate steps of a placement algorithm.

zooko commented

2013-12-03 15:27:24 +00:00

Author

Replying to [daira]comment:30:

Servers-of-Happiness-Level: the size of a maximum matching in the bipartite graph relating servers to shares that they hold.

I'd like to come up with a succinct specification of this which is understandable to people who don't know the phrases "matching" and "bipartite graph".

Servers-of-Happiness-Level: the maximum number of links that can be drawn from servers to shares that they hold, such that no server and no share appears more than once.

Closer! How about:

Servers-of-Happiness-Level: how many "server↔share" pairs are there, not counting any server more than once and not counting any share more than once.

Replying to [daira]comment:30: > > *Servers-of-Happiness-Level: the size of a maximum matching in the bipartite graph relating servers to shares that they hold.* I'd like to come up with a succinct specification of this which is understandable to people who don't know the phrases "matching" and "bipartite graph". > *Servers-of-Happiness-Level: the maximum number of links that can be drawn from servers to shares that they hold, such that no server and no share appears more than once.* Closer! How about: *Servers-of-Happiness-Level: how many "server↔share" pairs are there, **not** counting any server more than once and **not** counting any share more than once.*

zooko commented

2013-12-03 15:45:13 +00:00

Author

Replying to [zooko]comment:31:

Servers-of-Happiness-Level: how many "server↔share" pairs are there, not counting any server more than once and not counting any share more than once.

…following-up to my own post…

Oh, to be more precise, we have to specify that the Servers-of-Happiness-Level is the size of the largest such set of "server↔share" pairs. There is only one such set of "server↔share" pairs in the "ideal" case (e.g. number of servers > N, no server disconnects before an upload and then reconnects after an upload, etc.), but there could be different ways to choose "server↔share" pairs in non-ideal cases, such as:

"server↔share" pairs:

server 0 ↔ share 1
server 1 ↔ share 1
server 1 ↔ share 2
server 2 ↔ share 2

To determine what the Servers-of-Happiness-Level of this situation is, you have to pick a subset of the pairs so that the resulting subset doesn't have any server appearing more than once or any share appearing more than once. Therefore the Servers-of-Happiness-Level of this situation is 2.

So, this ticket (#2107) is about an uploader or repairer choosing not to upload a share to any server, if uploading that share would not increase the Servers-of-Happiness-Level. For example, in the case shown above, if the uploader is considering uploading share 0, it could do that an increase the Servers-of-Happiness-Level from 2 to 3:

"server↔share" pairs:

server 0 ↔ share 0
server 0 ↔ share 1
server 1 ↔ share 1
server 1 ↔ share 2
server 2 ↔ share 2

But once it has done that, if it is then considering uploading share 3, it cannot increase the Servers-of-Happiness-Level by uploading share 3, so if #2107 is accepted, it will not upload share 3 at all.

If I understand correctly, sickness, amontero, and recent-zooko are in favor of #2017 — do not upload a share if doing so does not increase the Servers-of-Happiness-Level — and gdt and previous-zooko are in favor of the opposite — go ahead and upload an extra share even if it doesn't increase Servers-of-Happiness-Level. I think PRabahy's comment:395687 is suggesting a different metric (which doesn't yet have a name), and in order to achieve that metric you have to upload shares even when uploading them doesn't increase the Servers-of-Happiness-Level.

Daira's comment:395686 is pretty persuasive to me: let's not do ticket #2107 until after Tahoe-LAFS v1.11 is released, on the basis of not changing multiple things at a time unnecessarily.

Replying to [zooko]comment:31: > > *Servers-of-Happiness-Level: how many "server↔share" pairs are there, **not** counting any server more than once and **not** counting any share more than once.* …following-up to my own post… Oh, to be more precise, we have to specify that the Servers-of-Happiness-Level is the size of the *largest* such set of "server↔share" pairs. There is only one such set of "server↔share" pairs in the "ideal" case (e.g. number of servers > `N`, no server disconnects before an upload and then reconnects after an upload, etc.), but there could be different ways to choose "server↔share" pairs in non-ideal cases, such as: "server↔share" pairs: * server 0 ↔ share 1 * server 1 ↔ share 1 * server 1 ↔ share 2 * server 2 ↔ share 2 To determine what the Servers-of-Happiness-Level of this situation is, you have to pick a subset of the pairs so that the resulting subset doesn't have any server appearing more than once or any share appearing more than once. Therefore the *Servers-of-Happiness-Level* of this situation is `2`. So, this ticket (#2107) is about an uploader or repairer choosing *not* to upload a share to any server, if uploading that share would not increase the Servers-of-Happiness-Level. For example, in the case shown above, if the uploader is considering uploading share 0, it could do that an increase the Servers-of-Happiness-Level from `2` to `3`: "server↔share" pairs: * server 0 ↔ share 0 * server 0 ↔ share 1 * server 1 ↔ share 1 * server 1 ↔ share 2 * server 2 ↔ share 2 But once it has done that, if it is then considering uploading share 3, it *cannot* increase the Servers-of-Happiness-Level by uploading share 3, so if #2107 is accepted, it will not upload share 3 at all. If I understand correctly, sickness, amontero, and recent-zooko are in favor of #2017 — do *not* upload a share if doing so does not increase the Servers-of-Happiness-Level — and gdt and previous-zooko are in favor of the opposite — go ahead and upload an extra share even if it doesn't increase Servers-of-Happiness-Level. I *think* PRabahy's [comment:395687](/tahoe-lafs/trac/issues/2107#issuecomment-395687) is suggesting a different metric (which doesn't yet have a name), and in order to achieve that metric you have to upload shares even when uploading them doesn't increase the Servers-of-Happiness-Level. Daira's [comment:395686](/tahoe-lafs/trac/issues/2107#issuecomment-395686) is pretty persuasive to me: let's not do ticket #2107 until after Tahoe-LAFS v1.11 is released, on the basis of not changing multiple things at a time unnecessarily.

zooko commented

2013-12-05 16:28:28 +00:00

Author

Servers-of-Happiness-Level: the size of the largest set of server↔share pairs, such that no server appears more than once in the set and no share appears more than once in the set.

*Servers-of-Happiness-Level: the size of the largest set of `server↔share` pairs, such that no server appears more than once in the set and no share appears more than once in the set.*

zooko commented

2013-12-30 23:41:03 +00:00

Author

Whatever we ultimately decide about this, we're going to release Tahoe-LAFS v1.11 before we do anything!