new version

[Imported from Trac: page KnownIssues, version 3]
2008-06-10 23:27:18 +00:00 · 2008-06-10 23:27:18 +00:00 · 1e6b888e60
parent b4964c497e
commit 1e6b888e60
1 changed files with 135 additions and 53 deletions
--- a/KnownIssues.md
+++ b/KnownIssues.md
@ -1,71 +1,153 @@
 # Known Issues
-This page describes known problems for recent releases of Tahoe. Issues are
+Below is a list of known issues in recent releases of Tahoe, and how to manage
-fixed as quickly as possible, however users of older releases may still need
+them.
 to be aware of these problems until they upgrade to a release which resolves
 it.
 ## Issues in [1.1 [/tahoe-lafs/trac-2024-07-25/milestone/127](/tahoe-lafs/trac-2024-07-25/milestone/127)]Tahoe (not quite released)
-### Servers which run out of space
+## issues in Tahoe v1.1.0, released 2008-06-10
-If a Tahoe storage server runs out of space, writes will fail with an
+### issue 1: server out of space when writing mutable file
 `IOError` exception. In some situations, Tahoe-1.1 clients will not react
 to this very well:
- * If the exception occurs during an immutable-share write, that share will
+If a v1.0 or v1.1.0 storage server runs out of disk space then its attempts to
-   be broken. The client will detect this, and will declare the upload as
+write data to the local filesystem will fail.  For immutable files, this will
-   failing if insufficient shares can be placed (this "shares of happiness"
+not lead to any problem (the attempt to upload that share to that server will
-   threshold defaults to 7 out of 10). The code does not yet search for new
+fail, the partially uploaded share will be deleted from the storage server's
-   servers to replace the full ones. If the upload fails, the server's
+"incoming shares" directory, and the client will move on to using another
-   upload-already-in-progress routines may interfere with a subsequent
+storage server instead).
   upload.
 * If the exception occurs during a mutable-share write, the old share will
   be left in place (and a new home for the share will be sought). If enough
   old shares are left around, subsequent reads may see the file in its
   earlier state, known as a "rollback" fault. Writing a new version of the
   file should find the newer shares correctly, although it will take
   longer (more roundtrips) than usual.
-The out-of-space handling code is not yet complete, and we do not yet have a
+If the write was an attempt to modify an existing mutable file, however, a
-space-limiting solution that is suitable for large storage nodes. The
+problem will result: when the attempt to write the new share fails due to
-"sizelimit" configuration uses a /usr/bin/du -style query at node startup,
+insufficient disk space, then it will be aborted and the old share will be left
-which takes a long time (tens of minutes) on storage nodes that offer 100GB
+in place.  If enough such old shares are left, then a subsequent read may get
-or more, making it unsuitable for highly-available servers.
+those old shares and see the file in its earlier state, which is a "rollback"
 failure.  With the default parameters (3-of-10), six old shares will be enough
 to potentially lead to a rollback failure.
-In lieu of 'sizelimit', server admins are advised to set the
+#### how to manage it
 NODEDIR/readonly_storage (and remove 'sizelimit', and restart their nodes) on
 their storage nodes before space is exhausted. This will stop the influx of
 immutable shares. Mutable shares will continue to arrive, but since these are
 mainly used by directories, the amount of space consumed will be smaller.
-Eventually we will have a better solution for this.
+Make sure your Tahoe storage servers don't run out of disk space.  This means
 refusing storage requests before the disk fills up. There are a couple of ways
 to do that with v1.1.
-== Issues in Tahoe 1.0 ==
+First, there is a configuration option named "sizelimit" which will cause the
 storage server to do a "du" style recursive examination of its directories at
 startup, and then if the sum of the size of files found therein is greater than
 the "sizelimit" number, it will reject requests by clients to write new
 immutable shares.
-=== Servers which run out of space ===
+However, that can take a long time (something on the order of a minute of
 examination of the filesystem for each 10 GB of data stored in the Tahoe
 server), and the Tahoe server will be unavailable to clients during that time.
-In addition to the problems described above, Tahoe-1.0 clients which
+Another option is to set the "readonly_storage" configuration option on the
-experience out-of-space errors while writing mutable files are likely to
+storage server before startup.  This will cause the storage server to reject
-think the write succeeded, when it in fact failed. This can cause data loss.
+all requests to upload new immutable shares.
-=== Large Directories or Mutable files in a specific range of sizes ===
+Note that neither of these configurations affect mutable shares: even if
 sizelimit is configured and the storage server currently has greater space used
 than allowed, or even if readonly_storage is configured, servers will continue
 to accept new mutable shares and will continue to accept requests to overwrite
 existing mutable shares.
-A mismatched pair of size limits causes a problem when a client attempts to
+Mutable files are typically used only for directories, and are usually much
-upload a large mutable file with a size between 3139275 and 3500000 bytes.
+smaller than immutable files, so if you use one of these configurations to stop
-(Mutable files larger than 3.5MB are refused outright). The symptom is very
+the influx of immutable files while there is still sufficient disk space to
-high memory usage (3GB) and 100% CPU for about 5 minutes. The attempted write
+receive an influx of (much smaller) mutable files, you may be able to avoid the
-will fail, but the client may think that it succeeded. This size corresponds
+potential for "rollback" failure.
 to roughly 9000 entries in a directory.
-This was fixed in 1.1, as ticket #379. Files up to 3.5MB should now work
+A future version of Tahoe will include a fix for this issue.  Here is
-properly, and files above that size should be rejected properly. Both servers
+[the mailing list
-and clients must be upgraded to resolve the problem, although once the client
+discussion](http://allmydata.org/pipermail/tahoe-dev/2008-May/000630.html) about how that future version will work.
 is upgraded to 1.1 the memory usage and false-success problems should be
 fixed.
 === pycryptopp compile errors resulting in corruption ===
-Certain combinations of compiler, linker, and pycryptopp versions may cause
+## issues in Tahoe v1.1.0 and v1.0.0
 corruption errors during decryption, resulting in corrupted plaintext.
 ### issue 2: pyOpenSSL and/or Twisted defect resulting false alarms in the unit tests
 The combination of Twisted v8.1.0 and pyOpenSSL v0.7 causes the Tahoe v1.1 unit
 tests to fail, even though the behavior of Tahoe itself which is being tested is
 correct.
 #### how to manage it
 If you are using Twisted v8.1.0 and pyOpenSSL v0.7, then please ignore XYZ in
 XYZ.  Downgrading to an older version of Twisted or pyOpenSSL will cause those
 false alarms to stop happening.
 ## issues in Tahoe v1.0.0, released 2008-03-25
 (Tahoe v1.0 was superceded by v1.1 which was released 2008-06-10.)
 ### issue 3: server out of space when writing mutable file
 In addition to the problems caused by insufficient disk space described above,
 v1.0 clients which are writing mutable files when the servers fail to write to
 their filesystem are likely to think the write succeeded, when it in fact
 failed. This can cause data loss.
 #### how to manage it
 Upgrade client to v1.1, or make sure that servers are always able to write to
 their local filesystem (including that there is space available) as described in
 "issue 1" above.
 ### issue 4: server out of space when writing immutable file
 Tahoe v1.0 clients are using v1.0 servers which are unable to write to their
 filesystem during an immutable upload will correctly detect the first failure,
 but if they retry the upload without restarting the client, or if another client
 attempts to upload the same file, the second upload may appear to succeed when
 it hasn't, which can lead to data loss.
 #### how to manage it
 Upgrading either or both of the client and the server to v1.1 will fix this
 issue.  Also it can be avoided by ensuring that the servers are always able to
 write to their local filesystem (including that there is space available) as
 described in "issue 1" above.
 ### issue 5: large directories or mutable files in a specific range of sizes
 If a client attempts to upload a large mutable file with a size greater than
 about 3,139,000 and less than or equal to 3,500,000 bytes then it will fail but
 appear to succeed, which can lead to data loss.
 (Mutable files larger than 3,500,000 are refused outright).  The symptom of the
 failure is very high memory usage (3 GB of memory) and 100% CPU for about 5
 minutes, before it appears to succeed, although it hasn't.
 Directories are stored in mutable files, and a directory of approximately 9000
 entries may fall into this range of mutable file sizes (depending on the size of
 the filenames or other metadata associated with the entries).
 #### how to manage it
 This was fixed in v1.1, under ticket #379.  If the client is upgraded to v1.1,
 then it will fail cleanly instead of falsely appearing to succeed when it tries
 to write a file whose size is in this range.  If the server is also upgraded to
 v1.1, then writes of mutable files whose size is in this range will succeed.
 (If the server is upgraded to v1.1 but the client is still v1.0 then the client
 will still suffer this failure.)
 ### issue 6: pycryptopp defect resulting in data corruption
 Versions of pycryptopp earlier than pycryptopp-0.5.0 had a defect which, when
 compiled with some compilers, would cause AES-256 encryption and decryption to
 be computed incorrectly.  This could cause data corruption.  Tahoe v1.0
 required, and came with a bundled copy of, pycryptopp v0.3.
 #### how to manage it
 You can detect whether pycryptopp-0.3 has this failure when it is compiled by
 your compiler.  Run the unit tests that come with pycryptopp-0.3: unpack the
 "pycryptopp-0.3.tar" file that comes in the Tahoe v1.0 `misc/dependencies`
 directory, cd into the resulting `pycryptopp-0.3.0` directory, and execute
 `python ./setup.py test`.  If the tests pass, then your compiler does not
 trigger this failure.
 Tahoe v1.1 requires, and comes with a bundled copy of, pycryptopp v0.5.1, which
 does not have this defect.