large directories take a long time to modify #383

Open
opened 2008-04-12 18:58:21 +00:00 by warner · 3 comments
warner commented 2008-04-12 18:58:21 +00:00
Owner

We found that the prodnet webapi servers were taking about 35 seconds to modify a large (about 10k entries) dirnode. That time is measured from the end of the Retrieve to the beginning of the Publish. We're pretty sure that this is because the loop that decrypts and verifies the write-cap in each row is in python (whereas the code that decrypts the mutable file contents as a whole, in a single pycryptopp call, runs in 8 milliseconds). Then the other loop that re-encrypts everything takes a similar amount of time, probably 17 seconds each.

We don't actually need to decrypt the whole thing. Most of the modifications we're doing are to add or replace specific children. Since the dirnode is represented as a concatenations of netstrings (one per child), we could have a loop that iterates through the string, reading the netstring length prefix, extracting the child name, seeing if it matches, and skipping ahead to the next child if not. This would result in a big string of everything before the match, the match itself, and a big string of everything after the match. We should modify the small match piece, then concatenate everything back together when we're done. Only the piece we're changing needs to be decrypted/reencrypted.

In addition, we could probably get rid of the HMAC on those writecaps now, I think they're leftover from the central-vdrive-server days. But we should put that compatibility break off until we move to DSA directories (if we choose to go with the 'deep-verify' caps).

We found that the prodnet webapi servers were taking about 35 seconds to modify a large (about 10k entries) dirnode. That time is measured from the end of the Retrieve to the beginning of the Publish. We're pretty sure that this is because the loop that decrypts and verifies the write-cap in each row is in python (whereas the code that decrypts the mutable file contents as a whole, in a single pycryptopp call, runs in 8 milliseconds). Then the other loop that re-encrypts everything takes a similar amount of time, probably 17 seconds each. We don't actually need to decrypt the whole thing. Most of the modifications we're doing are to add or replace specific children. Since the dirnode is represented as a concatenations of netstrings (one per child), we could have a loop that iterates through the string, reading the netstring length prefix, extracting the child name, seeing if it matches, and skipping ahead to the next child if not. This would result in a big string of everything before the match, the match itself, and a big string of everything after the match. We should modify the small match piece, then concatenate everything back together when we're done. Only the piece we're changing needs to be decrypted/reencrypted. In addition, we could probably get rid of the HMAC on those writecaps now, I think they're leftover from the central-vdrive-server days. But we should put that compatibility break off until we move to DSA directories (if we choose to go with the 'deep-verify' caps).
tahoe-lafs added the
major
enhancement
1.0.0
labels 2008-04-12 18:58:21 +00:00
tahoe-lafs added this to the eventually milestone 2008-04-12 18:58:21 +00:00
tahoe-lafs added the
code-dirnodes
label 2008-04-24 23:50:10 +00:00
zooko commented 2009-05-04 16:53:43 +00:00
Author
Owner

See also #327 (performance measurement of directories), #414 (profiling on directory unpacking), and #329 (dirnodes could cache encrypted/serialized entries for speed).

See also #327 (performance measurement of directories), #414 (profiling on directory unpacking), and #329 (dirnodes could cache encrypted/serialized entries for speed).
zooko commented 2009-06-25 16:30:40 +00:00
Author
Owner

Tahoe-LAFS hasn't checked the HMAC since changeset:f1fbd4feae1fb5d7, 2008-12-21, which patch was first released in Tahoe-LAFS v1.3.0, 2009-02-13.

If we produced dirnode entries which didn't have the HMAC tag (or which had a blank space instead of correct tag bytes there -- I don't know how the parsing works), then clients older than v1.3.0 would get some sort of integrity error when trying to read that entry. Our backward-compatibility tradition is typically longer-duration than this. For example, [the most recent release notes]source:relnotes.txt@20090414025430-92b7f-6e06ebbd16f80e68a6141d44fc25cc1d49726b22 say that Tahoe-LAFS v1.4.1 is backwards-compatible with v1.0, and in fact it is actually compatible with v0.8 or so (unless you try to upload large files -- files with shares larger than about 4 GiB).

So, let's not yet break compatibility by ceasing to emit the HMAC tags.

Also, let this be a lesson to us to that if we notice forward-compatibility issues and fix them early then this frees us up to evolve the protocols earlier. We actually stopped needing the HMAC tags when we released Tahoe-LAFS v0.7 in 2008-01-07, but we didn't notice that we were still checking them and erroring if they were wrong until the v1.3.0 release. So, everybody go look at forward-compatibility issues and fix them!

Tahoe-LAFS hasn't checked the HMAC since changeset:f1fbd4feae1fb5d7, 2008-12-21, which patch was first released in Tahoe-LAFS v1.3.0, 2009-02-13. If we produced dirnode entries which didn't have the HMAC tag (or which had a blank space instead of correct tag bytes there -- I don't know how the parsing works), then clients older than v1.3.0 would get some sort of integrity error when trying to read that entry. Our backward-compatibility tradition is typically longer-duration than this. For example, [the most recent release notes]source:relnotes.txt@20090414025430-92b7f-6e06ebbd16f80e68a6141d44fc25cc1d49726b22 say that Tahoe-LAFS v1.4.1 is backwards-compatible with v1.0, and in fact it is actually compatible with v0.8 or so (unless you try to upload large files -- files with shares larger than about 4 GiB). So, let's not yet break compatibility by ceasing to emit the HMAC tags. Also, let this be a lesson to us to that if we notice forward-compatibility issues and fix them early then this frees us up to evolve the protocols earlier. We actually stopped *needing* the HMAC tags when we released Tahoe-LAFS v0.7 in 2008-01-07, but we didn't notice that we were still checking them and erroring if they were wrong until the v1.3.0 release. So, everybody go look at [forward-compatibility issues](http://allmydata.org/trac/tahoe/search?q=forward-compatibility) and fix them!
zooko commented 2009-06-25 16:39:02 +00:00
Author
Owner

Oh, by the way the time to actually compute and write the HMAC tags is really tiny compared to the other performance issues. (The following tickets are how we can be sure of this: #327 (performance measurement of directories), #414 (profiling on directory unpacking).) If we could stop producing the HMAC tags, I would be happier about the simplification than about the speed-up...

Oh, by the way the time to actually compute and write the HMAC tags is really tiny compared to the other performance issues. (The following tickets are how we can be sure of this: #327 (performance measurement of directories), #414 (profiling on directory unpacking).) If we could stop producing the HMAC tags, I would be happier about the simplification than about the speed-up...
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#383
No description provided.