add Kevan's MDMF write-up

[Imported from Trac: page GSoCIdeas2010, version 98]
zooko 2010-03-29 01:21:16 +00:00
parent 4f23050a34
commit 985d2ecf72

@ -1,4 +1,4 @@
Tahoe-LAFS Summer-of-Code Projects
# Tahoe-LAFS Summer-of-Code Projects
This page contains specific suggestions for projects we would like to see in the Summer of Code. Note that they vary a lot in required skills and difficulty. We hope to get applications with a broad spectrum.
@ -14,6 +14,7 @@ Deadlines and directions for students' applications to the Google Summer-of-Code
|---|---|---|
|*Project*|*Difficulty*|*Contact*|
|[#RedundantArrayofIndependentClouds Redundant Array of Independent Clouds]|Medium|[Zooko Wilcox-O'Hearn]mailto:zooko@zooko.com or any mentor|
|[#RedundantArrayofIndependentClouds Redundant Array of Independent Clouds]|Medium|[Zooko Wilcox-O'Hearn]mailto:zooko@zooko.com or any mentor|
|[#ShareMigration Share Migration]|Medium|[Brian Warner]mailto:warner-tahoe@lothar.com or any mentor|
|[#SecureDecentralizedWiki Secure Decentralized Wiki]|Medium|[Zooko Wilcox-O'Hearn]mailto:zooko@zooko.com or any mentor|
|[#CloudApps Cloud Apps]|EasyHard|[Jack Lloyd]mailto:lloyd@randombit.net or any mentor|
@ -25,11 +26,38 @@ Deadlines and directions for students' applications to the Google Summer-of-Code
----
# Redundant Array of Independent Clouds
## Medium-Sized Distributed Mutable Files (MDMF)
Mutable files in Tahoe-LAFS have some significant limitations and
performance issues, as discussed in
[docs/performance.txt](http://allmydata.org/trac/tahoe-lafs/browser/docs/performance.txt). Users who aren't aware of these limitations are
surprised when they find out that mutable files can't scale to large
sizes without using unacceptable levels of memory, and that reading one
byte of the file costs as much as reading the entire file.
A fix for this issue would essentially be fixing #393. That is,
* Developing mutable files that are segmented on upload, as with immutable files. Part of this would involve making sure that the way we currently ensure the integrity of the parts of mutable files stored on servers is adequate for your new design, and altering it if it isn't.
* Implementing efficient reading and writing of arbitrary spans of those mutable files.
This would make Tahoe-LAFS less surprising to users, and allow mutable
files to be used in more ways than they currently are. If successful enough, this might allow Tahoe-LAFS to support range queries or "graph database"-style access, in the style of the "NoSQL" projects.
To learn more about this issue, you should first read
[docs/performance.txt](http://allmydata.org/trac/tahoe-lafs/browser/docs/performance.txt), so you're familiar with the performance problems
with mutable files as currently implemented. You should also look at the
[file encoding specification](http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/file-encoding.txt), to understand how immutable files are
segmented (since you'll be doing something similar with this project). [The mutable file specification](http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/mutable.txt) may be informative as well.
The mutable file upload and download code is in
[mutable](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/mutable),
and, for comparison, the immutable file upload and download code is in
[immutable](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/immutable).
## Redundant Array of Independent Clouds
Add backends to the storage servers so that they store their shares on a cloud storage system instead of on their local filesystem. This means that you can get all of the availability and scalability of services such as Amazon S3 or Rackspace CloudFiles combined with the security properties of Tahoe-LAFS. See [the RAIC diagram](http://allmydata.org/~zooko/RAIC.png). For details read ticket #999 which including pointers to the relevant source code and instructions on how to begin writing the code.
# Share Migration
## Share Migration
When uploading a file to a grid, Tahoe-LAFS will make sure that the file is
healthy (a good discussion of what healthy means is found in #778) before
@ -66,19 +94,19 @@ file health and accounting, so prospective students would do well to explore
those open issues, too. A good accounting jumping-off point is #666. A good
jumping-off point for health is #778.
# Secure Decentralized Wiki
## Secure Decentralized Wiki
Write a wiki in Google's ["caja"](http://code.google.com/p/google-caja/) dialect of JavaScript. This wiki will load and store data directly on a Tahoe-LAFS storage grid so that it is a full "Cloud App"—there is no server. All computation is done in the user's web browser in caja and all of the storage is done by the decentralized Tahoe-LAFS storage grid. This wiki should leverage Tahoe-LAFS's secure sharing features to offer fine-grained, dynamic, and easy transclusion or client-side mashups. This project is intended to be the successor to [the [TiddlyWiki](TiddlyWiki)-on-Tahoe-LAFS project](http://allmydata.org/trac/tiddly_on_tahoe), which is a wiki written in JavaScript and hosted on Tahoe-LAFS, but one that has been "bolted on" to Tahoe-LAFS instead of designed for Tahoe-LAFS, and is currently incapable of good transclusions or mashups.
To get started, play with [the [TiddlyWiki](TiddlyWiki)-on-Tahoe-LAFS quick start](http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:7h7syiurogz5erc2au74tjwguu:h7bdxvjtvidlkcdbld3j2d5sbgyzsbqs7wdnu6yznqrejzssc5za/wiki.html), read the source code of [the HTTPSavingPlugin](http://allmydata.org/trac/tiddly_on_tahoe/browser/tahoe_tiddly/HTTPSavingPlugin.js) and [the [TahoePlugin](TahoePlugin)](http://allmydata.org/trac/tiddly_on_tahoe/browser/tahoe_tiddly/TahoePlugin.js) for TiddlyWiki, and experiment with [writing live caja applets](http://caja.appspot.com/).
# Cloud Apps
## Cloud Apps
Difficulty: easy to hard, depending on project choice and how far you want to push it
Invent your own Summer-of-Code project by building a new web app on top of Tahoe-LAFS. The [#SecureDecentralizedWiki Secure Decentralized Wiki] is one example of a Cloud App. See [GSoCIdeas](GSoCIdeas)/CloudApps for other ideas.
# WebDAV Support
## WebDAV Support
Difficulty: medium to hard, depending on how much of an existing WebDAV implementation you are able to reuse
@ -129,12 +157,11 @@ that you may need to take into account.
[Tickets labelled 'webdav'](http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~webdav)
# Distributed Introduction
## Distributed Introduction
Implement a protocol for distributed introduction, thus removing the only remaining Single Point of Failure (SPoF) in the Tahoe-LAFS system. For details see [ticket #68](/tahoe-lafs/trac-2024-07-25/issues/68#issuecomment-102558) which describes the distributed notification algorithm and points to the relevant source code.
# DVCS Integration
## DVCS Integration
Write patches for the [git](http://git-scm.com/) or [darcs](http://darcs.net) distributed revision control tool so that it reads and writes directly to a Tahoe-LAFS storage grid instead of its local filesystem. This creates a "revision control repository in the sky"—a repository that is distributed, fault-tolerant, and highly available. It also lends Tahoe-LAFS's unique security and access-control properties to your revision control system—you can share read-only access or read-write access with specific people through Tahoe-LAFS's capability access control system, and you can rely on the integrated digital signatures to verify that you are reading an authorized version of the repository.