From cd666efedc897c641f9af766ad06b08b25b6334c Mon Sep 17 00:00:00 2001 From: kevan <> Date: Tue, 16 Mar 2010 02:49:59 +0000 Subject: [PATCH] [Imported from Trac: page GSoCIdeas/Notes, version 2] --- GSoCIdeas/Notes.md | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/GSoCIdeas/Notes.md b/GSoCIdeas/Notes.md index d9511f5..352268b 100644 --- a/GSoCIdeas/Notes.md +++ b/GSoCIdeas/Notes.md @@ -36,7 +36,44 @@ Students: you don't have to use one of the following Ideas. You can come up wi ## Server Selection *Which servers are connected to your client, and which of them have which shares of your files?* - * Dynamically migrate shares to maintain file health. + === Dynamically migrate shares to maintain file health. === + +Difficulty: medium - hard + +When uploading a file to a grid, Tahoe-LAFS will make sure that the file is +healthy (a good discussion of what healthy means is found in #778) before +reporting that the file is uploaded successfully. Tools to effectively +maintain file health (or to adapt to new definitions of health) aren't +quite complete, however -- our users have had several use cases that aren't +easily addressed with what we have. Students taking this project would be +building tools to address those use cases. + +A good starting point would be to become familiar with how files are placed on +a grid. [architecture.txt](http://allmydata.org/trac/tahoe-lafs/browser/docs/architecture.txt), +[file-encoding.txt](http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/file-encoding.txt), +[mutable.txt](http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/mutable.txt), +[the immutable file upload code](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/immutable/upload.py), and +[the mutable file upload code](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/mutable/publish.py) are good +places to do that. Also, you might want to look at the +[storage server code](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/storage/server.py) to understand that +better. Some good tickets to start looking at are #699, #543, and #232; you'll +find that those link to other tickets. + +There are many ways to help address these issues. Some ideas: + + * Alter the CLI and the WUI to give users the ability to rebalance + files that they've uploaded already. (#699) + * Build tools that allow node administrators to moves shares around + a grid (#543, #864) + * Alter Tahoe-LAFS to rebalance mutable files when uploading a new version + of them. (#232) + +(it is doubtful that any one of these projects is enough to fill a summer, but, combined, they would be a big usability improvement for Tahoe-LAFS) + +Depending on how you address this, this is tightly integrated with ideas of +file health and accounting, so prospective students would do well to explore +those open issues, too. A good accounting jumping-off point is #666. A good +jumping-off point for health is #778. * Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration. * Deal with unreliable nodes and connections in general, getting away from allmydata.com's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction. [Tickets labelled 'availability'](http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~availability) * Abstract out the server selection part of Tahoe-LAFS so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe-LAFS. See also [this note about standardization of LAFS](http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html#2009-02-06).