web upload uses up lots of RAM #29

New issue

Closed

opened 2007-05-03 00:13:13 +00:00 by zooko · 19 comments

zooko commented

2007-05-03 00:13:13 +00:00

Uploading files through the 'webish' frontend (with the upload form) results in a memory footprint of at least 2 * filesize. Downloading files might do the same.

Zooko's first observations suggest this might be more like 4x.

The main culprit seems to be the stdlib 'cgi' module, which twisted.web uses to parse the multipart-encoded upload form. The file to be uploaded appears as an input field in this form.

A secondary thing to look at (if/when we fix the upload side) is to make the download side streaming (producer/consumer), to avoid buffering the whole file in the twisted Transport queue.

Uploading files through the 'webish' frontend (with the upload form) results in a memory footprint of at least 2 * filesize. Downloading files might do the same. Zooko's first observations suggest this might be more like 4x. The main culprit seems to be the stdlib 'cgi' module, which twisted.web uses to parse the multipart-encoded upload form. The file to be uploaded appears as an input field in this form. A secondary thing to look at (if/when we fix the upload side) is to make the download side streaming (producer/consumer), to avoid buffering the whole file in the twisted Transport queue.

zooko added the

c/code

p/major

t/defect

labels 2007-05-03 00:13:13 +00:00

zooko commented

2007-05-03 14:45:33 +00:00

Author

The core encoding mechanism in Tahoe has been designed to use up a small amount of RAM which does not grow at all as the size of the input file grows. My first guess as to what is using up all this RAM is that the file is being transferred over HTTP from the web browser to the node and then stored in RAM in the node before being encoded.

zooko commented

2007-05-04 22:21:40 +00:00

Author

So to be more specific, if I understand correctly that the file is being uploaded by the web browser to the node, then the fix should probably be for the node to encode the file and upload the shares to the blockservers as the file is being received by the node, so that the node doesn't actually store more than a small segment of the file in RAM. This also implies that the node (acting as web server) has to be able to read in only as much of the file as it is ready to encode and upload, and leave the rest waiting, rather than greedily read in the entire file at once.

So to be more specific, if I understand correctly that the file is being uploaded by the web browser to the node, then the fix should probably be for the node to encode the file and upload the shares to the blockservers *as* the file is being received by the node, so that the node doesn't actually store more than a small segment of the file in RAM. This also implies that the node (acting as web server) has to be able to read in only as much of the file as it is ready to encode and upload, and leave the rest waiting, rather than greedily read in the entire file at once.

zooko commented

2007-05-23 22:29:42 +00:00

Author

I'm upgrading the priority and assigning this to myself, because now that #22 is fixed, this issue is preventing me from sharing large files with my friends and family.

zooko added

p/critical

and removed

p/major

labels 2007-05-23 22:29:42 +00:00

zooko commented

2007-05-23 22:34:06 +00:00

Author

(http://pyramid.twistedmatrix.com/pipermail/twisted-web/2005-March/001315.html)

makes me think that I'll need to rewrite webish to use twisted.web2 in order to do streaming upload. Reading further...

(http://pyramid.twistedmatrix.com/pipermail/twisted-web/2005-March/001315.html) makes me think that I'll need to rewrite webish to use twisted.web2 in order to do streaming upload. Reading further...

zooko commented

2007-05-23 22:48:40 +00:00

Author

I don't really know that it is 4 X. Looking at the code, I guess that it is probably 1 X plus a bit.

zooko commented

2007-05-23 22:49:26 +00:00

Author

How about this: there exists some constants c1, c2, and n1, such that for all file sizes n > n1, the RAM usage is greater than c1n and less than c2n.

How about this: there exists some constants c1, c2, and n1, such that for all file sizes n > n1, the RAM usage is greater than c1*n and less than c2*n.

warner commented

2007-05-23 23:09:58 +00:00

http://twistedmatrix.com/trac/ticket/288 is relevant.. once it is resolved, we should have access to the incoming file in small pieces. We can't use that, however, because our fileid/key-generating passes require access to the whole file. So the best we can reduce our memory footprint, but not our disk footprint. To really reduce the memory footprint, we'd need to use randomly-generated keys, give up on convergent encoding, use a randomly-generated 'storage index', and split the read-and-verify-cryptext capability (the verifierid) into read-crypttext (storage-index) and verify-crypttext (verifierid) capabilities. Not entirely unreasonable, mind you, but it would have significant impact on the mesh as a whole.

http://twistedmatrix.com/trac/ticket/1903 is also relevant: a POST that takes a long long time to complete will run afoul of the timeout. Fortunately it looks like the default value for this timeout is 12 hours.

I think that request.content is a filehandle that either references a StringIO (for small bodies) or a disk-based tempfile (for large bodies), so changing webish.py to use uploader.upload_filehandle(request.content) instead of upload_data(request.content.read()) would fix the memory problem on the upload side.

On the download side, I think the webish.WebDownloadTarget already does the desired streaming.

Of course, I should really finish implementing those memory-footprint tests so we could watch this memory usage drop once we make this upload_filehandle fix..

twisted's http server needs a non-trivial amount of work before it will be convenient for us to access the incoming data before the POST has completed. On the plus side, I believe that twisted.web writes large HTTP bodies to a temporary file (to avoid consuming a lot of memory). <http://twistedmatrix.com/trac/ticket/288> is relevant.. once it is resolved, we should have access to the incoming file in small pieces. We can't use that, however, because our fileid/key-generating passes require access to the whole file. So the best we can reduce our memory footprint, but not our disk footprint. To really reduce the memory footprint, we'd need to use randomly-generated keys, give up on convergent encoding, use a randomly-generated 'storage index', and split the read-and-verify-cryptext capability (the verifierid) into read-crypttext (storage-index) and verify-crypttext (verifierid) capabilities. Not entirely unreasonable, mind you, but it would have significant impact on the mesh as a whole. <http://twistedmatrix.com/trac/ticket/1903> is also relevant: a POST that takes a long long time to complete will run afoul of the timeout. Fortunately it looks like the default value for this timeout is 12 hours. I think that request.content is a filehandle that either references a StringIO (for small bodies) or a disk-based tempfile (for large bodies), so changing webish.py to use uploader.upload_filehandle(request.content) instead of upload_data(request.content.read()) would fix the memory problem on the upload side. On the download side, I think the webish.WebDownloadTarget already does the desired streaming. Of course, I should really finish implementing those memory-footprint tests so we could watch this memory usage drop once we make this upload_filehandle fix..

warner added the

v/0.2.0

label 2007-05-24 00:15:52 +00:00

zooko commented

2007-05-25 03:50:30 +00:00

Author

The memory-footprint tests are now ticket #54.

zooko commented

2007-05-27 14:47:07 +00:00

Author

Hey Brian: changeset:04b649f97127b9ce didn't fix the problem (although I think it might have helped a little -- I'm not sure). I'm going to pass this ticket over to you, but feel free to pass it back to me if you think you won't actually be motivated to work on it soon...

zooko removed their assignment 2007-05-27 14:47:07 +00:00

warner was assigned by zooko

2007-05-27 14:47:07 +00:00

warner commented

2007-05-30 01:08:57 +00:00

With changeset changeset:ea78b4b605568479, running 'make check-memory' gives some preliminary numbers.
They only cover upload, and they don't yet make a lot of sense.

The two most obvious places where we could consume memory roughly equal to the uploaded file are when we compute key/fileid/verifierid (allmydata.upload.Uploader.compute_id_strings, which uses a 64kB blocksize), and when we read in a segment for encoding: allmydata.encode.Encoder.MAX_SEGMENT_SIZE is 2MiB, which should result in 8MiB of footprint (we read crypttext in really tiny pieces, and encode it into segsize -sized shares with a 4x expansion).

On one run, uploading a 10MB file causes the peak memory footprint to grow from 24MB to 36MB, although the VmSize returned to normal (23MB) after the upload finished. On another run, a 10MB file made the peak size grow from 24MB to 45MB, but uploading a 50MB file did not increase the peak size further.

More results as I get them..

Uploading a 50MB file causes the

With changeset changeset:ea78b4b605568479, running 'make check-memory' gives some preliminary numbers. They only cover upload, and they don't yet make a lot of sense. The two most obvious places where we could consume memory roughly equal to the uploaded file are when we compute key/fileid/verifierid (`allmydata.upload.Uploader.compute_id_strings`, which uses a 64kB blocksize), and when we read in a segment for encoding: `allmydata.encode.Encoder.MAX_SEGMENT_SIZE` is 2MiB, which should result in 8MiB of footprint (we read crypttext in really tiny pieces, and encode it into `segsize` -sized shares with a 4x expansion). On one run, uploading a 10MB file causes the peak memory footprint to grow from 24MB to 36MB, although the [VmSize](wiki/VmSize) returned to normal (23MB) after the upload finished. On another run, a 10MB file made the peak size grow from 24MB to 45MB, but uploading a 50MB file did not increase the peak size further. More results as I get them.. Uploading a 50MB file causes the

warner commented

2007-05-30 04:38:17 +00:00

I instrumented the upload process directly, grabbing VmSize and VmPeak out of /proc/NNN/status at various stages.

It looks like the simultaneous callRemote("put_block") calls are a significant hit, doubling the memory footprint of the encoded shares for a brief while as they flow out the network. For background, Foolscap puts off serialization until as late as possible, but unless/until we create a custom Slicer for shares, strings are still serialized as strings. So we have the 4SEGMENT_SIZE encoded shares sitting in RAM, then Encoder._encoded_segment() does a batch of callRemotes in parallel, giving one encoded share to each landlord. When the callRemote is processed (which is generally right away, unless we've kicked Foolscap into streaming mode, and there is no API yet to enable that), the arguments are deep-serialized right away, creating a second copy of those 4SEGMENT_SIZE shares. Since our SEGMENT_SIZE is 2MiB, that means 8MiB.

When Twisted's write() gets the data (specifically twisted.internet.abstract.FileDescriptor.write), it appends the strings to a list, since it is expecting to get lots of tiny strings. Later, when the socket becomes writable, FileDescriptor.doWrite merges all the strings in that list into a single one, and gives a derived buffer object to the socket. For a brief moment, the list of strings and the merged string are alive at the same time, but since this is happening one connection at a time, that should only bump up our footprint by 80kiB. There might be some other places where buffers get copied, but I'm inclined to doubt it.

So given a 2MiB segment size and a 25-of-100 (i.e. 4x) encoding, we've got:

2MiB crypttext (in a list of chunks, in Encoder.do_segment)
8MiB encoded shares (these overlap since codec.encode's Deferred is returned pre-fired), in a list of 100 80kiB blocks
8MiB serialized callRemote arguments (in the transport's _tempDataBuffer)

The serialized callRemote arguments stick around until we've finished writing them all out to the socket. Encoder._encoded_segment uses a DeferredList for pacing, so we don't do any work on segment 2 until we've finished processing segment 1, so this 2+8+8=18MiB footprint won't overlap from one segment to another.

As an experiment, I modified Encoder._encoded_segment to do the callRemotes in serial, rather than in parallel. The VmPeak for uploading a 50MiB file dropped from 37MB to 29MB, exactly as expected. Of course, this uses the network very differently and might be faster or slower.

A good thing to keep in mind is that nodes which are uploading files may also be receiving shares for that same file, so you have to add the received-share memory footprint to the sending-share footprint. I hacked the memory test to disable receiving shares to remove the effects of this.

I instrumented the upload process directly, grabbing [VmSize](wiki/VmSize) and [VmPeak](wiki/VmPeak) out of /proc/NNN/status at various stages. It looks like the simultaneous callRemote("put_block") calls are a significant hit, doubling the memory footprint of the encoded shares for a brief while as they flow out the network. For background, Foolscap puts off serialization until as late as possible, but unless/until we create a custom Slicer for shares, strings are still serialized as strings. So we have the 4*SEGMENT_SIZE encoded shares sitting in RAM, then Encoder._encoded_segment() does a batch of callRemotes in parallel, giving one encoded share to each landlord. When the callRemote is processed (which is generally right away, unless we've kicked Foolscap into streaming mode, and there is no API yet to enable that), the arguments are deep-serialized right away, creating a second copy of those 4*SEGMENT_SIZE shares. Since our SEGMENT_SIZE is 2MiB, that means 8MiB. When Twisted's write() gets the data (specifically `twisted.internet.abstract.FileDescriptor.write`), it appends the strings to a list, since it is expecting to get lots of tiny strings. Later, when the socket becomes writable, `FileDescriptor.doWrite` merges all the strings in that list into a single one, and gives a derived `buffer` object to the socket. For a brief moment, the list of strings and the merged string are alive at the same time, but since this is happening one connection at a time, that should only bump up our footprint by 80kiB. There might be some other places where buffers get copied, but I'm inclined to doubt it. So given a 2MiB segment size and a 25-of-100 (i.e. 4x) encoding, we've got: * 2MiB crypttext (in a list of chunks, in `Encoder.do_segment`) * 8MiB encoded shares (these overlap since codec.encode's Deferred is returned pre-fired), in a list of 100 80kiB blocks * 8MiB serialized callRemote arguments (in the transport's `_tempDataBuffer`) The serialized callRemote arguments stick around until we've finished writing them all out to the socket. `Encoder._encoded_segment` uses a [DeferredList](wiki/DeferredList) for pacing, so we don't do any work on segment 2 until we've finished processing segment 1, so this 2+8+8=18MiB footprint won't overlap from one segment to another. As an experiment, I modified `Encoder._encoded_segment` to do the callRemotes in serial, rather than in parallel. The [VmPeak](wiki/VmPeak) for uploading a 50MiB file dropped from 37MB to 29MB, exactly as expected. Of course, this uses the network very differently and might be faster or slower. A good thing to keep in mind is that nodes which are uploading files may also be receiving shares for that same file, so you have to add the received-share memory footprint to the sending-share footprint. I hacked the memory test to disable receiving shares to remove the effects of this.

warner commented

2007-05-30 06:24:47 +00:00

Thoughts on the received-share memory footprint:

As the share arrives over the wire inside a Foolscap STRING token, memory usage will vary between 80KiB and 160KiB per share (we get a little bit more of the data, notice that we haven't gotten the whole token yet, append the chunk we got to the buffer, repeat until we have the whole token: each append operation creates a copy, after which the old buffer is released). Once the STRING token is finished, it should remain immutable and not copied elsewhere until it is delivered to the remote_put_block method, which will write it to the bucket and then release it. So even if we're sending multiple shares to a single peer (which will always be the case until we get a mesh with more than 100 peers), I wouldn't expect the received-share memory usage to be very large.

Thoughts on the received-share memory footprint: As the share arrives over the wire inside a Foolscap STRING token, memory usage will vary between 80KiB and 160KiB per share (we get a little bit more of the data, notice that we haven't gotten the whole token yet, append the chunk we got to the buffer, repeat until we have the whole token: each append operation creates a copy, after which the old buffer is released). Once the STRING token is finished, it should remain immutable and not copied elsewhere until it is delivered to the remote_put_block method, which will write it to the bucket and then release it. So even if we're sending multiple shares to a single peer (which will always be the case until we get a mesh with more than 100 peers), I wouldn't expect the received-share memory usage to be very large.

zooko commented

2007-05-30 17:03:27 +00:00

Author

I'm sorry I didn't report this more specifically at the beginning -- I assumed that you had seen the same thing. Good work on the check_memory test! I'll take this ticket back for now... Talk to you soon.

It seems like you didn't reproduce the problem that I was reporting. When I upload a file of 600 million bytes, it uses up more than 700 million bytes of RAM, exceeding the max memory on my system and eventually triggering the arrival of the dreaded Linux Angel of Death -- the OOM killer. If I upload a file of 150 million bytes, it uses up something on the order of 200-300 million bytes of RAM, but it succeeds. I'm sorry I didn't report this more specifically at the beginning -- I assumed that you had seen the same thing. Good work on the check_memory test! I'll take this ticket back for now... Talk to you soon.

warner was unassigned by zooko

2007-05-30 17:03:27 +00:00

zooko self-assigned this 2007-05-30 17:03:27 +00:00

zooko commented

2007-05-30 17:05:06 +00:00

Author

It sounds like, from what you write above, that other than this mysterious O(N) RAM usage problem, the other parts of upload, download, and foolscap are already pretty good about using limited memory.

warner commented

2007-06-06 20:30:12 +00:00

I nailed it down to a call inside twisted.web.http.Request.requestReceived:

                args.update(cgi.parse_multipart(self.content, pdict))

when uploading a 100MB file, the process' memory footprint grows 200MB during that call. self.content is a filehandle (specifically a tempfile.TemporaryFile()), but apparently the stdlib cgi module is reading it all into memory at some point.

Is there a way to avoid using forms for the upload? Maybe this is just endemic to the web.

Unfortunately, I suspect this will bite an XMLRPC interface as well, unless we build it to use a non-form POST for the data. It'd hit a foolscap interface too, unless/until we make a custom Unslicer that feeds data to a tempfile as it arrives.

I nailed it down to a call inside twisted.web.http.Request.requestReceived: args.update(cgi.parse_multipart(self.content, pdict)) when uploading a 100MB file, the process' memory footprint grows 200MB during that call. self.content is a filehandle (specifically a tempfile.TemporaryFile()), but apparently the stdlib cgi module is reading it all into memory at some point. Is there a way to avoid using forms for the upload? Maybe this is just endemic to the web. Unfortunately, I suspect this will bite an XMLRPC interface as well, unless we build it to use a non-form POST for the data. It'd hit a foolscap interface too, unless/until we make a custom Unslicer that feeds data to a tempfile as it arrives.

warner commented

2007-06-06 20:33:38 +00:00

it's also possible that Nevow's form handling passes around a string instead of a filehandle, so if/when we figure out a twisted.web or cgi.py fix, we may also need to investigate the nevow code. My hunch is that tahoe proper is behaving correctly on this front.

warner changed title from ~~uses up lots of RAM~~ to web upload uses up lots of RAM

2007-06-07 18:09:37 +00:00

zooko commented

2007-07-15 01:24:32 +00:00

Author

Maybe now that the web API works somebody will write a beautiful new AJAXy front-end that uses a proper PUT instead of an HTML form and thus avoids this problem entirely.

That would be a handy way to bypass this problem, as well as gaining a beautiful GUI. (Or WUI, I suppose.)

Maybe now that the web API works somebody will write a beautiful new AJAXy front-end that uses a proper PUT instead of an HTML form and thus avoids this problem entirely. That would be a handy way to bypass this problem, as well as gaining a beautiful GUI. (Or WUI, I suppose.)

warner commented

2007-07-16 05:25:49 +00:00

I think that modifying twisted.web to skip the form parsing (or do it in a more memory-friendly way) is the most likely answer. This sort of internal hackery might also be what we need to begin encoding before the file has finished uploading (the 'streaming upload' goal).

Unfortunately, I believe that an HTML form is the only way for a web page (AJAX or not) to gain access to the local filesystem. So even though javascript might be able to do a PUT (and I'm not convinced of that either), it won't be able to get at the local file to do it with. I think that modifying twisted.web to skip the form parsing (or do it in a more memory-friendly way) is the most likely answer. This sort of internal hackery might also be what we need to begin encoding before the file has finished uploading (the 'streaming upload' goal).

warner commented

2007-08-11 01:43:07 +00:00

There were other memory footprint problems that occurred before I made our default be to not send shares to ourself (#96). #97 talks about the send-to-self problem.

I'm going to declare this ticket (#29) to be about the POST issue and call it closed. The comments here about simultaneous-share-push are still useful, so I'll add a reference from #97 to this one.

I've fixed the POST-side memory footprint problem, in changeset:3bc708529f6a64cb, by replacing the regular twisted Request object with a variant that parses the form elements in a different way. This sort of hackery might induce a dependency upon certain versions of Twisted (because I had to cut-and-paste most of the method, so if twisted.web's internals change, this may no longer work), but I think it is likely to work pretty well. There were other memory footprint problems that occurred before I made our default be to not send shares to ourself (#96). #97 talks about the send-to-self problem. I'm going to declare this ticket (#29) to be about the POST issue and call it closed. The comments here about simultaneous-share-push are still useful, so I'll add a reference from #97 to this one.

Rows
Columns