add notes on load testing
[Imported from Trac: page Performance, version 21]
parent
9ce3cfc7a1
commit
d191d4b47c
|
@ -103,6 +103,8 @@ upload speed is the constant per-file overhead, and the FEC expansion factor.
|
||||||
|
|
||||||
## Storage Servers
|
## Storage Servers
|
||||||
|
|
||||||
|
### storage index count
|
||||||
|
|
||||||
ext3 (on tahoebs1) refuses to create more than 32000 subdirectories in a
|
ext3 (on tahoebs1) refuses to create more than 32000 subdirectories in a
|
||||||
single parent directory. In 0.5.1, this appears as a limit on the number of
|
single parent directory. In 0.5.1, this appears as a limit on the number of
|
||||||
buckets (one per storage index) that any [StorageServer](StorageServer) can hold. A simple
|
buckets (one per storage index) that any [StorageServer](StorageServer) can hold. A simple
|
||||||
|
@ -124,3 +126,49 @@ server design (perhaps with a database to locate shares).
|
||||||
|
|
||||||
I was unable to measure a consistent slowdown resulting from having 30000
|
I was unable to measure a consistent slowdown resulting from having 30000
|
||||||
buckets in a single storage server.
|
buckets in a single storage server.
|
||||||
|
|
||||||
|
## System Load
|
||||||
|
|
||||||
|
The source:src/allmydata/test/check_load.py tool can be used to generate
|
||||||
|
random upload/download traffic, to see how much load a Tahoe grid imposes on
|
||||||
|
its hosts.
|
||||||
|
|
||||||
|
Preliminary results on the Allmydata test grid (14 storage servers spread
|
||||||
|
across four machines (each a 3ishGHz P4), two web servers): we used three
|
||||||
|
check_load.py clients running with 100ms delay between requests, an
|
||||||
|
80%-download/20%-upload traffic mix, and file sizes distributed exponentially
|
||||||
|
with a mean of 10kB. These three clients get about 8-15kBps downloaded,
|
||||||
|
2.5kBps uploaded, doing about one download per second and 0.25 uploads per
|
||||||
|
second. These traffic rates were higher at the beginning of the process (when
|
||||||
|
the directories were smaller and thus faster to traverse).
|
||||||
|
|
||||||
|
The storage servers were minimally loaded. Each storage node was consuming
|
||||||
|
about 9% of its CPU at the start of the test, 5% at the end. These nodes were
|
||||||
|
receiving about 50kbps throughout, and sending 50kbps initially (increasing
|
||||||
|
to 150kbps as the dirnodes got larger). Memory usage was trivial, about 35MB
|
||||||
|
[VmSize](VmSize) per node, 25MB RSS. The load average on a 4-node box was about 0.3 .
|
||||||
|
|
||||||
|
The two machines serving as web servers (performing all encryption, hashing,
|
||||||
|
and erasure-coding) were the most heavily loaded. The clients distribute
|
||||||
|
their requests randomly between the two web servers. Each server was
|
||||||
|
averaging 60%-80% CPU usage. Memory consumption is minor, 37MB [VmSize](VmSize) and
|
||||||
|
29MB RSS on one server, 45MB/33MB on the other. Load average grew from about
|
||||||
|
0.6 at the start of the test to about 0.8 at the end. Network traffic
|
||||||
|
(including both client-side plaintext and server-side shares) outbound was
|
||||||
|
about 600Kbps for the whole test, while the inbound traffic started at
|
||||||
|
200Kbps and rose to about 1Mbps at the end.
|
||||||
|
|
||||||
|
### initial conclusions
|
||||||
|
|
||||||
|
So far, Tahoe is scaling as designed: the client nodes are the ones doing
|
||||||
|
most of the work, since these are the easiest to scale. In a deployment where
|
||||||
|
central machines are doing encoding work, CPU on these machines will be the
|
||||||
|
first bottleneck. Profiling can be used to determine how the upload process
|
||||||
|
might be optimized: we don't yet know if encryption, hashing, or encoding is
|
||||||
|
a primary CPU consumer. We can change the upload/download ratio to examine
|
||||||
|
upload and download separately.
|
||||||
|
|
||||||
|
Deploying large networks in which clients are not doing their own encoding
|
||||||
|
will require sufficient CPU resources. Storage servers use minimal CPU, so
|
||||||
|
having all storage servers also be web/encoding servers is a natural
|
||||||
|
approach.
|
||||||
|
|
Loading…
Reference in a new issue