memory leak in allmydata.test.test_web #1729

Closed
opened 2012-05-05 06:05:47 +00:00 by zooko · 7 comments
zooko commented 2012-05-05 06:05:47 +00:00
Owner

Running allmydata.test.test_web with --until-failure results in memory usage increasing until it fails when a subprocess can't be created:

time  ./bin/tahoe @trial --rterror --until-failure allmydata.test.test_web

    test_welcome_page_mkdir_button ...                                     [OK]

-------------------------------------------------------------------------------
Ran 243 tests in 22.676s

PASSED (successes=243)
Test Pass 13
allmydata.test.test_web
  Grid
    test_add_lease ...                                                     [OK]
    test_blacklist ...                                                     [OK]
    test_deep_add_lease ...                                                [OK]
    test_deep_check ...                                                    [OK]
    test_deep_check_and_repair ...                                         [OK]
    test_exceptions ...                                                    [OK]
    test_filecheck ...                                                     [OK]
    test_immutable_unknown ...                                             [OK]
    test_mutant_dirnodes_are_omitted ...                                   [OK]
    test_repair_html ...                                                   [OK]
    test_repair_json ...                                                   [OK]
    test_unknown ...                                                       [OK]
  IntroducerWeb
    test_welcome ... Node._startService failed, aborting
[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 12] Cannot allocate memory
/usr/lib/python2.7/threading.py:524:__bootstrap
/usr/lib/python2.7/threading.py:551:__bootstrap_inner
/usr/lib/python2.7/threading.py:504:run
--- <exception caught here> ---
/home/zooko/playground/twisted/twisted/twisted/python/threadpool.py:167:_worker
/home/zooko/playground/twisted/twisted/twisted/python/context.py:118:callWithContext
/home/zooko/playground/twisted/twisted/twisted/python/context.py:81:callWithContext
/home/zooko/playground/tahoe-lafs/dw/src/allmydata/util/iputil.py:224:_synchronously_find_addresses_via_config
/home/zooko/playground/tahoe-lafs/dw/src/allmydata/util/iputil.py:238:_query
/usr/lib/python2.7/subprocess.py:679:__init__
/usr/lib/python2.7/subprocess.py:1143:_execute_child
]
calling os.abort()

real    4m41.030s
user    4m20.853s
sys     0m13.239s

This is on my Macbook Pro running Ubuntu 12.04. Tahoe is current (darcs) trunk:

$ darcs pull
Pulling from "zooko@tahoe-lafs.org:/home/source/darcs/tahoe-lafs/trunk"...
No remote changes to pull in!
$ python setup.py update_version
running update_version
darcsver: wrote '1.9.0-r5493' into src/allmydata/_version.py
$ ./bin/tahoe --version
allmydata-tahoe: 1.9.0-r5493,
foolscap: 0.6.3.post0,
pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958,
zfec: 1.4.24,
Twisted: 10.1.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.7.3,
platform: Linux-Ubuntu_12.04-x86_64-64bit_ELF,
pyOpenSSL: 0.12,
simplejson: 2.3.2,
pycrypto: 2.4.1,
pyasn1: unknown,
mock: 0.8.0beta3,
sqlite3: 2.6.0 [sqlite 3.7.9],
setuptools: 0.6c16dev3

Running `allmydata.test.test_web` with `--until-failure` results in memory usage increasing until it fails when a subprocess can't be created: ``` time ./bin/tahoe @trial --rterror --until-failure allmydata.test.test_web ``` … ``` test_welcome_page_mkdir_button ... [OK] ------------------------------------------------------------------------------- Ran 243 tests in 22.676s PASSED (successes=243) Test Pass 13 allmydata.test.test_web Grid test_add_lease ... [OK] test_blacklist ... [OK] test_deep_add_lease ... [OK] test_deep_check ... [OK] test_deep_check_and_repair ... [OK] test_exceptions ... [OK] test_filecheck ... [OK] test_immutable_unknown ... [OK] test_mutant_dirnodes_are_omitted ... [OK] test_repair_html ... [OK] test_repair_json ... [OK] test_unknown ... [OK] IntroducerWeb test_welcome ... Node._startService failed, aborting [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 12] Cannot allocate memory /usr/lib/python2.7/threading.py:524:__bootstrap /usr/lib/python2.7/threading.py:551:__bootstrap_inner /usr/lib/python2.7/threading.py:504:run --- <exception caught here> --- /home/zooko/playground/twisted/twisted/twisted/python/threadpool.py:167:_worker /home/zooko/playground/twisted/twisted/twisted/python/context.py:118:callWithContext /home/zooko/playground/twisted/twisted/twisted/python/context.py:81:callWithContext /home/zooko/playground/tahoe-lafs/dw/src/allmydata/util/iputil.py:224:_synchronously_find_addresses_via_config /home/zooko/playground/tahoe-lafs/dw/src/allmydata/util/iputil.py:238:_query /usr/lib/python2.7/subprocess.py:679:__init__ /usr/lib/python2.7/subprocess.py:1143:_execute_child ] calling os.abort() real 4m41.030s user 4m20.853s sys 0m13.239s ``` This is on my Macbook Pro running Ubuntu 12.04. Tahoe is current (darcs) trunk: ``` $ darcs pull Pulling from "zooko@tahoe-lafs.org:/home/source/darcs/tahoe-lafs/trunk"... No remote changes to pull in! $ python setup.py update_version running update_version darcsver: wrote '1.9.0-r5493' into src/allmydata/_version.py $ ./bin/tahoe --version allmydata-tahoe: 1.9.0-r5493, foolscap: 0.6.3.post0, pycryptopp: 0.6.0.1206569328141510525648634803928199668821045408958, zfec: 1.4.24, Twisted: 10.1.0, Nevow: 0.10.0, zope.interface: unknown, python: 2.7.3, platform: Linux-Ubuntu_12.04-x86_64-64bit_ELF, pyOpenSSL: 0.12, simplejson: 2.3.2, pycrypto: 2.4.1, pyasn1: unknown, mock: 0.8.0beta3, sqlite3: 2.6.0 [sqlite 3.7.9], setuptools: 0.6c16dev3 ```
tahoe-lafs added the
code
normal
defect
1.9.1
labels 2012-05-05 06:05:47 +00:00
tahoe-lafs added this to the undecided milestone 2012-05-05 06:05:47 +00:00
warner commented 2012-05-22 21:23:13 +00:00
Author
Owner

There are four test classes in test_web: Web, IntroducerWeb, Util,
and Grid. I ruled out Util and IntroducerWeb (running them for
several minutes in a row has a flat memory footprint). Running Grid
by itself uses a bunch but seems to peak at 78M VMSize after about
2 or 3 passes. Running Web by itself consumes more and more memory.

I'm pretty sure this is due to the "all_contents" class-level dictionary in allmydata.test.common.FakeCHKFileNode and FakeMutableFileNode. I'll look into parametizing that with a container which can be discarded between test runs.

There are four test classes in test_web: Web, IntroducerWeb, Util, and Grid. I ruled out Util and IntroducerWeb (running them for several minutes in a row has a flat memory footprint). Running Grid by itself uses a bunch but seems to peak at 78M VMSize after about 2 or 3 passes. Running Web by itself consumes more and more memory. I'm pretty sure this is due to the "all_contents" class-level dictionary in `allmydata.test.common.FakeCHKFileNode` and `FakeMutableFileNode`. I'll look into parametizing that with a container which can be discarded between test runs.
warner commented 2012-05-22 22:31:33 +00:00
Author
Owner

Yeah, that seemed to be the problem. The patch I'm about to fix moves that container out into the FakeClient created fresh for each test case, which removes the lingering buildup. With that in place, I see the memory usage for --until-failure looping of test_web.Web grow to maybe 250MB or 300MB and then drop back to 150MB (as GC kicks in). I think that closes the leak, although it might be nice to identify why it's still using so much RAM (maybe the test files it's uploading are excessively large).

Yeah, that seemed to be the problem. The patch I'm about to fix moves that container out into the FakeClient created fresh for each test case, which removes the lingering buildup. With that in place, I see the memory usage for `--until-failure` looping of `test_web.Web` grow to maybe 250MB or 300MB and then drop back to 150MB (as GC kicks in). I think that closes the leak, although it might be nice to identify why it's still using so much RAM (maybe the test files it's uploading are excessively large).
Brian Warner <warner@lothar.com> commented 2012-05-22 23:07:30 +00:00
Author
Owner

In changeset:bfee999e20aa9fdc:

test_web.py: fix memory leak when run with --until-failure

The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.

This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.

Closes #1729
In changeset:bfee999e20aa9fdc: ``` test_web.py: fix memory leak when run with --until-failure The Fake*Node classes in test/common.py were accumulating share data in a class-level dictionary, which persisted from one test run to the next. As a result, running test_web.py over and over (with trial's --until-failure feature) made this dictionary grow without bound, eventually running out of memory. This fix moves that dictionary into the FakeClient built fresh for each test, so it doesn't build up. It does the same thing for "file_types", which was much smaller but still lived at the class level. Closes #1729 ```
tahoe-lafs added the
fixed
label 2012-05-22 23:07:30 +00:00
Brian Warner <warner@lothar.com> closed this issue 2012-05-22 23:07:30 +00:00
Brian Warner <warner@lothar.com> commented 2012-05-22 23:08:08 +00:00
Author
Owner

In changeset:bfee999e20aa9fdc:

test_web.py: fix memory leak when run with --until-failure

The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.

This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.

Closes #1729
In changeset:bfee999e20aa9fdc: ``` test_web.py: fix memory leak when run with --until-failure The Fake*Node classes in test/common.py were accumulating share data in a class-level dictionary, which persisted from one test run to the next. As a result, running test_web.py over and over (with trial's --until-failure feature) made this dictionary grow without bound, eventually running out of memory. This fix moves that dictionary into the FakeClient built fresh for each test, so it doesn't build up. It does the same thing for "file_types", which was much smaller but still lived at the class level. Closes #1729 ```
tahoe-lafs modified the milestone from undecided to 1.10.0 2012-05-22 23:09:01 +00:00
Brian Warner <warner@lothar.com> commented 2012-05-23 05:28:01 +00:00
Author
Owner

In changeset:5503/1.9.2:

test_web.py: fix memory leak when run with --until-failure

The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.

This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.

Closes #1729
In changeset:5503/1.9.2: ``` test_web.py: fix memory leak when run with --until-failure The Fake*Node classes in test/common.py were accumulating share data in a class-level dictionary, which persisted from one test run to the next. As a result, running test_web.py over and over (with trial's --until-failure feature) made this dictionary grow without bound, eventually running out of memory. This fix moves that dictionary into the FakeClient built fresh for each test, so it doesn't build up. It does the same thing for "file_types", which was much smaller but still lived at the class level. Closes #1729 ```
davidsarah commented 2012-05-23 05:28:30 +00:00
Author
Owner

Safe to apply to 1.9.2 since it only affects tests.

Safe to apply to 1.9.2 since it only affects tests.
tahoe-lafs modified the milestone from 1.10.0 to 1.9.2 2012-05-23 05:28:30 +00:00
Brian Warner <warner@lothar.com> commented 2012-07-10 20:04:56 +00:00
Author
Owner

In changeset:5852/cloud-backend:

test_web.py: fix memory leak when run with --until-failure

The Fake*Node classes in test/common.py were accumulating share data in
a class-level dictionary, which persisted from one test run to the next.
As a result, running test_web.py over and over (with trial's
--until-failure feature) made this dictionary grow without bound,
eventually running out of memory.

This fix moves that dictionary into the FakeClient built fresh for each
test, so it doesn't build up. It does the same thing for "file_types",
which was much smaller but still lived at the class level.

Closes #1729
In changeset:5852/cloud-backend: ``` test_web.py: fix memory leak when run with --until-failure The Fake*Node classes in test/common.py were accumulating share data in a class-level dictionary, which persisted from one test run to the next. As a result, running test_web.py over and over (with trial's --until-failure feature) made this dictionary grow without bound, eventually running out of memory. This fix moves that dictionary into the FakeClient built fresh for each test, so it doesn't build up. It does the same thing for "file_types", which was much smaller but still lived at the class level. Closes #1729 ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1729
No description provided.