investigate best FUSE+sshfs options to use for performance and correctness of SFTP via sshfs #1189

Open
opened 2010-08-28 01:13:53 +00:00 by davidsarah · 4 comments
davidsarah commented 2010-08-28 01:13:53 +00:00
Owner

It looks as though at least direct_io and big_writes may be beneficial, so that writes are not limited to 4 KiB blocks.

It looks as though at least direct_io and big_writes may be beneficial, so that writes are not limited to 4 KiB blocks.
tahoe-lafs added the
code-frontend
major
defect
1.8β
labels 2010-08-28 01:13:53 +00:00
tahoe-lafs added this to the undecided milestone 2010-08-28 01:13:53 +00:00
Author
Owner

I started doing a couple random tests I could think of. I didn't repeat many of the tests, since they took a bit of time and it's all manual, but it sort of gives an idea:

Most of these tests were copying a large file (728MB .iso file) to and from a tahoe introducer/storage client running inside a VirtualBox VM on the same host (both guest and host running ubuntu). Most of the copying was done with "time rsync -rhPa ", and when copying a large file to tahoe, after the transfer the command hangs for another minute or so. I checked the flog during this time and there was activity (read i think) so it may be that rsync tries to checksum the file after transfer, i'm not sure.

To verify the file was being transfered correctly, I also did "time md5sum mnt/iso". I would expect this to be similar to simply reading the file, but for some reason it performed differently...

with it mounted as: "sshfs -p PORT server:/ mnt/"

  • rsync .iso TO tahoe: 4m51s to transfer ( 2.39 MB/s ), 6m14s total
  • rsync .iso FROM tahoe: 5m42s ( 2.13 MB/s )
  • cp .iso FROM tahoe: 5m31s
  • time md5sum iso:
    • first try: 7m46s
    • second try: 5m13s

with it mounted as: "sshfs -p PORT -o direct_io,big_writes server:/ mnt/"

  • rsync .iso TO tahoe: 3m37s to transfer ( 3.19 MB/s ), 5m1s total
  • rsync .iso FROM tahoe: 5m15s ( 2.31 MB/s )
  • cp .iso FROM tahoe: 5m31s
  • time md5sum iso:
    • first try: 10m40s
    • second try: 9m25s

obviously I expect the values to fluctuate a bit, but it seems like direct_io,big_writes bumps up the write speed a bit, without really affecting the read speed. I'm not really sure why it hit md5sum time so bad...

I also tried to rsync a large directory of source files (4mb, 981 files) to tahoe, but it seems to be acting odd, and stalls a lot, resulting in a very long transfer time (120 - 140 minutes). This happened with and without the options.

I started doing a couple random tests I could think of. I didn't repeat many of the tests, since they took a bit of time and it's all manual, but it sort of gives an idea: Most of these tests were copying a large file (728MB .iso file) to and from a tahoe introducer/storage client running inside a [VirtualBox](wiki/VirtualBox) VM on the same host (both guest and host running ubuntu). Most of the copying was done with "time rsync -rhPa ", and when copying a large file to tahoe, after the transfer the command hangs for another minute or so. I checked the flog during this time and there was activity (read i think) so it may be that rsync tries to checksum the file after transfer, i'm not sure. To verify the file was being transfered correctly, I also did "time md5sum mnt/iso". I would expect this to be similar to simply reading the file, but for some reason it performed differently... **with it mounted as: "sshfs -p PORT server:/ mnt/"** * rsync .iso TO tahoe: 4m51s to transfer ( 2.39 MB/s ), 6m14s total * rsync .iso FROM tahoe: 5m42s ( 2.13 MB/s ) * cp .iso FROM tahoe: 5m31s * time md5sum iso: * first try: 7m46s * second try: 5m13s **with it mounted as: "sshfs -p PORT -o direct_io,big_writes server:/ mnt/"** * rsync .iso TO tahoe: 3m37s to transfer ( 3.19 MB/s ), 5m1s total * rsync .iso FROM tahoe: 5m15s ( 2.31 MB/s ) * cp .iso FROM tahoe: 5m31s * time md5sum iso: * first try: 10m40s * second try: 9m25s obviously I expect the values to fluctuate a bit, but it seems like direct_io,big_writes bumps up the write speed a bit, without really affecting the read speed. I'm not really sure why it hit md5sum time so bad... I also tried to rsync a large directory of source files (4mb, 981 files) to tahoe, but it seems to be acting odd, and stalls a lot, resulting in a very long transfer time (120 - 140 minutes). This happened with and without the options.
zooko commented 2010-08-28 14:40:15 +00:00
Author
Owner

Dear bj0: thanks for the report!

What version(s) of Tahoe-LAFS were you using? If you have just been tracking the official trunk repo at http://tahoe-lafs.org/source/tahoe-lafs/trunk and haven't applied any other patches, then you can find out by running make make-version.

Dear bj0: thanks for the report! What version(s) of Tahoe-LAFS were you using? If you have just been tracking the official trunk repo at <http://tahoe-lafs.org/source/tahoe-lafs/trunk> and haven't applied any other patches, then you can find out by running `make make-version`.
davidsarah commented 2010-08-28 22:49:18 +00:00
Author
Owner

Thanks bj0.

big_writes should only affect writes, and I can't immediately see why it would have anything but a beneficial effect. direct_io might affect both reads and writes, and could cause some loss of performance for applications whose performance depends on kernel caching. Can you try the same tests with -o big_writes only?

[I initially suggested both because http://xtreemfs.blogspot.com/2008/08/fuse-performance.html said that direct_io was needed (at least for some version of sshfs FUSE and Linux tested in 2008) to support writing in blocks greater than 4 KiB. However, point 2 in http://article.gmane.org/gmane.comp.file-systems.fuse.devel/5292 suggests that this restriction might have been lifted.]

What Linux kernel version and sshfs version did you use?

Thanks bj0. `big_writes` should only affect writes, and I can't immediately see why it would have anything but a beneficial effect. `direct_io` might affect both reads and writes, and could cause some loss of performance for applications whose performance depends on kernel caching. Can you try the same tests with `-o big_writes` only? [I initially suggested both because http://xtreemfs.blogspot.com/2008/08/fuse-performance.html said that `direct_io` was needed (at least for some version of ~~sshfs~~ FUSE and Linux tested in 2008) to support writing in blocks greater than 4 KiB. However, point 2 in http://article.gmane.org/gmane.comp.file-systems.fuse.devel/5292 suggests that this restriction might have been lifted.] What Linux kernel version and sshfs version did you use?
Author
Owner

make make-version returned:
setup.py darcsver: wrote '1.8.0c2-r4702' into src/allmydata/_version.py

sshfs Version: 2.2-1build1 (from ubuntu repo)

client uname -a: Linux nazgul 2.6.32-020632-generic #020632 SMP Thu Dec 3 10:09:58 UTC 2009 x86_64 GNU/Linux
server uname -a: Linux testbuntu 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:12:52 UTC 2010 i686 GNU/Linux

I was going to try the tests without direct_io, but I seem to be having trouble with my vm...

make make-version returned: setup.py darcsver: wrote '1.8.0c2-r4702' into src/allmydata/_version.py sshfs Version: 2.2-1build1 (from ubuntu repo) client uname -a: Linux nazgul 2.6.32-020632-generic #020632 SMP Thu Dec 3 10:09:58 UTC 2009 x86_64 GNU/Linux server uname -a: Linux testbuntu 2.6.32-24-generic #41-Ubuntu SMP Thu Aug 19 01:12:52 UTC 2010 i686 GNU/Linux I was going to try the tests without direct_io, but I seem to be having trouble with my vm...
tahoe-lafs added
code-frontend-ftp-sftp
and removed
code-frontend
labels 2014-12-02 19:44:04 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1189
No description provided.