benchmark Tahoe-LAFS compared to nosql dbs #932

Open
opened 2010-01-30 01:43:33 +00:00 by zooko · 10 comments
zooko commented 2010-01-30 01:43:33 +00:00
Owner

I'm curious how Tahoe-LAFS performs compared to nosql databases on the nosqlish loads that those users care about. Aaron Cordova did some benchmarks of Tahoe-LAFS vs. HDFS as the storage backend for Hadoop and reported in his HadoopWorld presentation that they performed about the same for the map-reduce computation (which is a read-intensive workload): http://www.slideshare.net/cloudera/hw09-map-reduce-over-tahoe-a-least-authority-encrypted-distributed-filesystem

Recently a scientist from Yahoo posted about his benchmarks of various nosql systems:

(@@http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201001.mbox/%3cC2D6929236FAC846B7A4FE1EC39910C64F27B52F25@SP1-EX07VS01.ds.corp.yahoo.com%3e@@)

He says that his benchmarking code will be open-sourced soon pending approval from Yahoo's legal department. Maybe we could contribute patches that make Tahoe-LAFS one of the systems that his benchmark system can measure.

N.B. not to get anyone's hopes up, I would expect Tahoe-LAFS to perform very badly on those workloads! They typically want to assign values to user-specified keys, which we don't have a native implementation of and which we would have to simulate somehow, such as by letting the user-chosen keys be the childnames in a mutable directory. So I would expect Tahoe-LAFS to be pretty much off the charts for bad performance on those workloads. But, I might be pleasantly surprised. And also: "What gets measured gets improved!" :-)

I'm curious how Tahoe-LAFS performs compared to nosql databases on the nosqlish loads that those users care about. Aaron Cordova did some benchmarks of Tahoe-LAFS vs. HDFS as the storage backend for Hadoop and reported in his HadoopWorld presentation that they performed about the same for the map-reduce computation (which is a read-intensive workload): <http://www.slideshare.net/cloudera/hw09-map-reduce-over-tahoe-a-least-authority-encrypted-distributed-filesystem> Recently a scientist from Yahoo posted about his benchmarks of various nosql systems: (@@http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201001.mbox/%3cC2D6929236FAC846B7A4FE1EC39910C64F27B52F25@SP1-EX07VS01.ds.corp.yahoo.com%3e@@) He says that his benchmarking code will be open-sourced soon pending approval from Yahoo's legal department. Maybe we could contribute patches that make Tahoe-LAFS one of the systems that his benchmark system can measure. N.B. not to get anyone's hopes up, I would expect Tahoe-LAFS to perform very badly on those workloads! They typically want to assign values to user-specified keys, which we don't have a native implementation of and which we would have to simulate somehow, such as by letting the user-chosen keys be the childnames in a mutable directory. So I would expect Tahoe-LAFS to be pretty much off the charts for bad performance on those workloads. But, I might be pleasantly surprised. And also: "What gets measured gets improved!" :-)
tahoe-lafs added the
dev-infrastructure
major
enhancement
1.5.0
labels 2010-01-30 01:43:33 +00:00
tahoe-lafs added this to the undecided milestone 2010-01-30 01:43:33 +00:00
zooko commented 2010-10-22 23:13:54 +00:00
Author
Owner

That benchmark that Brian Frank Cooper said would be open sourced has subsequently been open sourced:

http://github.com/brianfrankcooper/YCSB/wiki

That benchmark that Brian Frank Cooper said would be open sourced has subsequently been open sourced: <http://github.com/brianfrankcooper/YCSB/wiki>
bibilthaysose commented 2011-08-30 17:54:01 +00:00
Author
Owner

I'm going to attempt this benchmarking against mongo.

I'm going to attempt this benchmarking against mongo.
bibilthaysose commented 2011-09-02 00:26:55 +00:00
Author
Owner

YCSB Interface layer skeleton @ https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/TahoeLAFSClient.java

ping me if you want to help out, and i'll give out push privileges.

YCSB Interface layer skeleton @ <https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/TahoeLAFSClient.java> ping me if you want to help out, and i'll give out push privileges.
bibilthaysose commented 2011-09-14 15:56:14 +00:00
Author
Owner

reorganized and updated Tahoe java driver:

https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/org/lafs/TahoeLAFSConnection.java

currently blocked on figuring out why the InputStream returned by HttpResponse.getEntity().getContent() is empty. The request seems to be processed correctly, but there's no content which can't be correct. Probably something I'm doing wrong with the Apache HTTP interface. I'll ask around.

reorganized and updated Tahoe java driver: <https://github.com/grubino/Tahoe-YCSB--Interface-Layer/blob/master/org/lafs/TahoeLAFSConnection.java> currently blocked on figuring out why the [InputStream](wiki/InputStream) returned by [HttpResponse](wiki/HttpResponse).getEntity().getContent() is empty. The request seems to be processed correctly, but there's no content which can't be correct. Probably something I'm doing wrong with the Apache HTTP interface. I'll ask around.
zooko commented 2011-09-14 17:01:58 +00:00
Author
Owner

What does Apache have to do with it? Isn't the HTTP server the Tahoe-LAFS gateway?

What does Apache have to do with it? Isn't the HTTP server the Tahoe-LAFS gateway?
bibilthaysose commented 2011-10-26 14:16:45 +00:00
Author
Owner

Hi zooko, org.apache.http.[...] is the client-side web interface that I'm using. If you followed the link that I provided, you should have seen some 'import org.apache.[...]' statements in the top of the source files. That's what I was referring to. It turns out that in the Java community, the apache http classes are preferred to the native Java ones. Go figure! Anywho, I believe I've ironed out most of the problems I was having there. I'm currently talking to one of the maintainers of the MongoDB YCSB layer to find out how to get this merged into the YCSB repo, or at least reviewed by someone who knows Java and YCSB. That reminds me: PLEASE_REVIEW_THIS_CODE (when you get a chance):

https://github.com/grubino/Tahoe-YCSB--Interface-Layer

I'm sure that I've run afoul of Java best practices and general development best practices, and I invite anyone reading this to pleez point out my mistakes to me. I've looked over the code and have found a few things that I want to fix, but I'm sure I'm missing some stuff. Also, and not least of all, having reviewers makes me feel loved.

Hi zooko, org.apache.http.[...] is the client-side web interface that I'm using. If you followed the link that I provided, you should have seen some 'import org.apache.[...]' statements in the top of the source files. That's what I was referring to. It turns out that in the Java community, the apache http classes are preferred to the native Java ones. Go figure! Anywho, I believe I've ironed out most of the problems I was having there. I'm currently talking to one of the maintainers of the MongoDB YCSB layer to find out how to get this merged into the YCSB repo, or at least reviewed by someone who knows Java and YCSB. That reminds me: _PLEASE_REVIEW_THIS_CODE_ (when you get a chance): <https://github.com/grubino/Tahoe-YCSB--Interface-Layer> I'm sure that I've run afoul of Java best practices and general development best practices, and I invite anyone reading this to pleez point out my mistakes to me. I've looked over the code and have found a few things that I want to fix, but I'm sure I'm missing some stuff. Also, and not least of all, having reviewers makes me feel loved.
bibilthaysose commented 2011-10-26 14:23:13 +00:00
Author
Owner

I forgot to mention that I have been able to run some of the workloads (most notably workloada), and the performance for write operations is many orders of magnitude worse for Tahoe LAFS than for MongoDB. Mongo writes about 11,000 entries/sec (on my thinkpad T50) and my Tahoe LAFS test grid (1:1:1) writes about 0.5 (that's one entry every two seconds) or so. I'm not sure if that number would go up or down if I increased N/H/K. I'll post the real numbers when I have them handy, but it hasn't been a priority because there are other workloads that don't seem to be running properly. I want to make sure that the code is relatively bug-free before I actually post the numbers.

I forgot to mention that I have been able to run some of the workloads (most notably workloada), and the performance for write operations is many orders of magnitude worse for Tahoe LAFS than for MongoDB. Mongo writes about 11,000 entries/sec (on my thinkpad T50) and my Tahoe LAFS test grid (1:1:1) writes about 0.5 (that's one entry every two seconds) or so. I'm not sure if that number would go up or down if I increased N/H/K. I'll post the real numbers when I have them handy, but it hasn't been a priority because there are other workloads that don't seem to be running properly. I want to make sure that the code is relatively bug-free before I actually post the numbers.
zooko commented 2011-10-26 14:27:50 +00:00
Author
Owner

Very cool! Real numbers! I look forward to having the time to investigate this. :-)

Very cool! Real numbers! I look forward to having the time to investigate this. :-)
bibilthaysose commented 2011-10-28 22:31:12 +00:00
Author
Owner

Need a public place to put TahoeLAFSConnection.jar.

Currently, I just have the source directly in the YCSB tree (err my branch of it):

https://github.com/grubino/YCSB/tree/master/db/tahoe/src/org/lafs

But this isn't really appropriate since the TahoeLAFSConnection class is not really part of YCSB, and I don't think this is going to pass muster with the YCSB maintainers. So once I jar this up, I'll need to put it somewhere that I can link from in the Tahoe YCSB client docs. Preferably somewhere on tahoe-lafs.org. Also, someone from the project may want to review the code at some point and make sure I didn't do anything too horrendous. It might actually be appropriate to put the source for this in the darcs repo too at some point. That would have the nice side-effect of increasing the likelihood that someone from the project would look at it.

Need a public place to put TahoeLAFSConnection.jar. Currently, I just have the source directly in the YCSB tree (err my branch of it): <https://github.com/grubino/YCSB/tree/master/db/tahoe/src/org/lafs> But this isn't really appropriate since the TahoeLAFSConnection class is not really part of YCSB, and I don't think this is going to pass muster with the YCSB maintainers. So once I jar this up, I'll need to put it somewhere that I can link from in the Tahoe YCSB client docs. Preferably somewhere on tahoe-lafs.org. Also, someone from the project may want to review the code at some point and make sure I didn't do anything too horrendous. It might actually be appropriate to put the source for this in the darcs repo too at some point. That would have the nice side-effect of increasing the likelihood that someone from the project would look at it.
zooko commented 2012-03-22 21:00:14 +00:00
Author
Owner

Let's create a project below https://github.com/tahoe-lafs for this.

Let's create a project below <https://github.com/tahoe-lafs> for this.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#932
No description provided.