add note and link to John Case's letter to tahoe-dev requesting server selection

[Imported from Trac: page ServerSelection, version 14]
zooko 2011-04-26 19:06:13 +00:00
parent beeec4337a
commit 671c5bcba5

@ -11,7 +11,8 @@ Different users of Tahoe-LAFS have different desires for "Which servers should I
* Several people -- again I'm sorry I've forgotten specific attribution -- want to identify which servers live in which cluster or co-lo or geographical area, and then to distribute shares evenly across clusters/colos/geographical-areas instead of evenly across servers. * Several people -- again I'm sorry I've forgotten specific attribution -- want to identify which servers live in which cluster or co-lo or geographical area, and then to distribute shares evenly across clusters/colos/geographical-areas instead of evenly across servers.
* Here's an example of this desire, Nathan Eisenberg asked on the mailing list for "Proximity Aware Decoding": <http://tahoe-lafs.org/pipermail/tahoe-dev/2009-December/003286.html> * Here's an example of this desire, Nathan Eisenberg asked on the mailing list for "Proximity Aware Decoding": <http://tahoe-lafs.org/pipermail/tahoe-dev/2009-December/003286.html>
* If you have *K+1* shares stored in a single location then you can repair after a loss (such as a hard drive failure) in that location without having to transfer data from other locations. This can save bandwidth expenses (since inter-location bandwidth is typically free), and of course it also means you can recover from that hard drive failure in that one location even if all the other locations have been stomped to death by Godzilla. * If you have *K+1* shares stored in a single location then you can repair after a loss (such as a hard drive failure) in that location without having to transfer data from other locations. This can save bandwidth expenses (since inter-location bandwidth is typically free), and of course it also means you can recover from that hard drive failure in that one location even if all the other locations have been stomped to death by Godzilla.
* This is called "rack awareness" in the Hadoop and Cassandra projects, where the unit of distribution would be the rack.
* John Case wrote a letter to tahoe-dev asking for this feature and comparing it to the concept of "families" in the Tor project: <http://tahoe-lafs.org/pipermail/tahoe-dev/2011-April/006301.html> letter
As I have emphasized a few times, we really should not try to write a super-clever algorithm into Tahoe which satisfies all of these people, plus all the other crazy people that will be using Tahoe-LAFS for other things in the future. Instead, we need some sort of configuration language or plugin system so that each crazy person can customize their own crazy server selection policy. I don't know the best way to implement this yet -- a domain specific language? Implement the above-mentioned list of seven policies into Tahoe-LAFS and have an option to choose which of the seven you want for this upload? My current favorite approach is: you give me a Python function. When the time comes to upload a file, I'll call that function and then use whichever servers it said to use. As I have emphasized a few times, we really should not try to write a super-clever algorithm into Tahoe which satisfies all of these people, plus all the other crazy people that will be using Tahoe-LAFS for other things in the future. Instead, we need some sort of configuration language or plugin system so that each crazy person can customize their own crazy server selection policy. I don't know the best way to implement this yet -- a domain specific language? Implement the above-mentioned list of seven policies into Tahoe-LAFS and have an option to choose which of the seven you want for this upload? My current favorite approach is: you give me a Python function. When the time comes to upload a file, I'll call that function and then use whichever servers it said to use.
#### Brian says: #### Brian says: