From 1e6b888e6033ea20b610a75550af03a1d8fa75ac Mon Sep 17 00:00:00 2001 From: zooko <> Date: Tue, 10 Jun 2008 23:27:18 +0000 Subject: [PATCH] new version [Imported from Trac: page KnownIssues, version 3] --- KnownIssues.md | 188 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 135 insertions(+), 53 deletions(-) diff --git a/KnownIssues.md b/KnownIssues.md index 9836609..09d27fc 100644 --- a/KnownIssues.md +++ b/KnownIssues.md @@ -1,71 +1,153 @@ # Known Issues -This page describes known problems for recent releases of Tahoe. Issues are -fixed as quickly as possible, however users of older releases may still need -to be aware of these problems until they upgrade to a release which resolves -it. +Below is a list of known issues in recent releases of Tahoe, and how to manage +them. -## Issues in [1.1 [/tahoe-lafs/trac-2024-07-25/milestone/127](/tahoe-lafs/trac-2024-07-25/milestone/127)]Tahoe (not quite released) -### Servers which run out of space +## issues in Tahoe v1.1.0, released 2008-06-10 -If a Tahoe storage server runs out of space, writes will fail with an -`IOError` exception. In some situations, Tahoe-1.1 clients will not react -to this very well: +### issue 1: server out of space when writing mutable file - * If the exception occurs during an immutable-share write, that share will - be broken. The client will detect this, and will declare the upload as - failing if insufficient shares can be placed (this "shares of happiness" - threshold defaults to 7 out of 10). The code does not yet search for new - servers to replace the full ones. If the upload fails, the server's - upload-already-in-progress routines may interfere with a subsequent - upload. - * If the exception occurs during a mutable-share write, the old share will - be left in place (and a new home for the share will be sought). If enough - old shares are left around, subsequent reads may see the file in its - earlier state, known as a "rollback" fault. Writing a new version of the - file should find the newer shares correctly, although it will take - longer (more roundtrips) than usual. +If a v1.0 or v1.1.0 storage server runs out of disk space then its attempts to +write data to the local filesystem will fail. For immutable files, this will +not lead to any problem (the attempt to upload that share to that server will +fail, the partially uploaded share will be deleted from the storage server's +"incoming shares" directory, and the client will move on to using another +storage server instead). -The out-of-space handling code is not yet complete, and we do not yet have a -space-limiting solution that is suitable for large storage nodes. The -"sizelimit" configuration uses a /usr/bin/du -style query at node startup, -which takes a long time (tens of minutes) on storage nodes that offer 100GB -or more, making it unsuitable for highly-available servers. +If the write was an attempt to modify an existing mutable file, however, a +problem will result: when the attempt to write the new share fails due to +insufficient disk space, then it will be aborted and the old share will be left +in place. If enough such old shares are left, then a subsequent read may get +those old shares and see the file in its earlier state, which is a "rollback" +failure. With the default parameters (3-of-10), six old shares will be enough +to potentially lead to a rollback failure. -In lieu of 'sizelimit', server admins are advised to set the -NODEDIR/readonly_storage (and remove 'sizelimit', and restart their nodes) on -their storage nodes before space is exhausted. This will stop the influx of -immutable shares. Mutable shares will continue to arrive, but since these are -mainly used by directories, the amount of space consumed will be smaller. +#### how to manage it -Eventually we will have a better solution for this. +Make sure your Tahoe storage servers don't run out of disk space. This means +refusing storage requests before the disk fills up. There are a couple of ways +to do that with v1.1. -== Issues in Tahoe 1.0 == +First, there is a configuration option named "sizelimit" which will cause the +storage server to do a "du" style recursive examination of its directories at +startup, and then if the sum of the size of files found therein is greater than +the "sizelimit" number, it will reject requests by clients to write new +immutable shares. -=== Servers which run out of space === +However, that can take a long time (something on the order of a minute of +examination of the filesystem for each 10 GB of data stored in the Tahoe +server), and the Tahoe server will be unavailable to clients during that time. -In addition to the problems described above, Tahoe-1.0 clients which -experience out-of-space errors while writing mutable files are likely to -think the write succeeded, when it in fact failed. This can cause data loss. +Another option is to set the "readonly_storage" configuration option on the +storage server before startup. This will cause the storage server to reject +all requests to upload new immutable shares. -=== Large Directories or Mutable files in a specific range of sizes === +Note that neither of these configurations affect mutable shares: even if +sizelimit is configured and the storage server currently has greater space used +than allowed, or even if readonly_storage is configured, servers will continue +to accept new mutable shares and will continue to accept requests to overwrite +existing mutable shares. -A mismatched pair of size limits causes a problem when a client attempts to -upload a large mutable file with a size between 3139275 and 3500000 bytes. -(Mutable files larger than 3.5MB are refused outright). The symptom is very -high memory usage (3GB) and 100% CPU for about 5 minutes. The attempted write -will fail, but the client may think that it succeeded. This size corresponds -to roughly 9000 entries in a directory. +Mutable files are typically used only for directories, and are usually much +smaller than immutable files, so if you use one of these configurations to stop +the influx of immutable files while there is still sufficient disk space to +receive an influx of (much smaller) mutable files, you may be able to avoid the +potential for "rollback" failure. -This was fixed in 1.1, as ticket #379. Files up to 3.5MB should now work -properly, and files above that size should be rejected properly. Both servers -and clients must be upgraded to resolve the problem, although once the client -is upgraded to 1.1 the memory usage and false-success problems should be -fixed. +A future version of Tahoe will include a fix for this issue. Here is +[the mailing list +discussion](http://allmydata.org/pipermail/tahoe-dev/2008-May/000630.html) about how that future version will work. -=== pycryptopp compile errors resulting in corruption === -Certain combinations of compiler, linker, and pycryptopp versions may cause -corruption errors during decryption, resulting in corrupted plaintext. +## issues in Tahoe v1.1.0 and v1.0.0 +### issue 2: pyOpenSSL and/or Twisted defect resulting false alarms in the unit tests + +The combination of Twisted v8.1.0 and pyOpenSSL v0.7 causes the Tahoe v1.1 unit +tests to fail, even though the behavior of Tahoe itself which is being tested is +correct. + +#### how to manage it + +If you are using Twisted v8.1.0 and pyOpenSSL v0.7, then please ignore XYZ in +XYZ. Downgrading to an older version of Twisted or pyOpenSSL will cause those +false alarms to stop happening. + + +## issues in Tahoe v1.0.0, released 2008-03-25 + +(Tahoe v1.0 was superceded by v1.1 which was released 2008-06-10.) + +### issue 3: server out of space when writing mutable file + +In addition to the problems caused by insufficient disk space described above, +v1.0 clients which are writing mutable files when the servers fail to write to +their filesystem are likely to think the write succeeded, when it in fact +failed. This can cause data loss. + +#### how to manage it + +Upgrade client to v1.1, or make sure that servers are always able to write to +their local filesystem (including that there is space available) as described in +"issue 1" above. + + +### issue 4: server out of space when writing immutable file + +Tahoe v1.0 clients are using v1.0 servers which are unable to write to their +filesystem during an immutable upload will correctly detect the first failure, +but if they retry the upload without restarting the client, or if another client +attempts to upload the same file, the second upload may appear to succeed when +it hasn't, which can lead to data loss. + +#### how to manage it + +Upgrading either or both of the client and the server to v1.1 will fix this +issue. Also it can be avoided by ensuring that the servers are always able to +write to their local filesystem (including that there is space available) as +described in "issue 1" above. + + +### issue 5: large directories or mutable files in a specific range of sizes + +If a client attempts to upload a large mutable file with a size greater than +about 3,139,000 and less than or equal to 3,500,000 bytes then it will fail but +appear to succeed, which can lead to data loss. + +(Mutable files larger than 3,500,000 are refused outright). The symptom of the +failure is very high memory usage (3 GB of memory) and 100% CPU for about 5 +minutes, before it appears to succeed, although it hasn't. + +Directories are stored in mutable files, and a directory of approximately 9000 +entries may fall into this range of mutable file sizes (depending on the size of +the filenames or other metadata associated with the entries). + +#### how to manage it + +This was fixed in v1.1, under ticket #379. If the client is upgraded to v1.1, +then it will fail cleanly instead of falsely appearing to succeed when it tries +to write a file whose size is in this range. If the server is also upgraded to +v1.1, then writes of mutable files whose size is in this range will succeed. +(If the server is upgraded to v1.1 but the client is still v1.0 then the client +will still suffer this failure.) + + +### issue 6: pycryptopp defect resulting in data corruption + +Versions of pycryptopp earlier than pycryptopp-0.5.0 had a defect which, when +compiled with some compilers, would cause AES-256 encryption and decryption to +be computed incorrectly. This could cause data corruption. Tahoe v1.0 +required, and came with a bundled copy of, pycryptopp v0.3. + +#### how to manage it + +You can detect whether pycryptopp-0.3 has this failure when it is compiled by +your compiler. Run the unit tests that come with pycryptopp-0.3: unpack the +"pycryptopp-0.3.tar" file that comes in the Tahoe v1.0 `misc/dependencies` +directory, cd into the resulting `pycryptopp-0.3.0` directory, and execute +`python ./setup.py test`. If the tests pass, then your compiler does not +trigger this failure. + +Tahoe v1.1 requires, and comes with a bundled copy of, pycryptopp v0.5.1, which +does not have this defect.