Sneakernet grid scenario #1657

Open
opened 2012-01-16 01:03:18 +00:00 by amontero · 8 comments
amontero commented 2012-01-16 01:03:18 +00:00
Owner

See also related issues: #793, #1107, #2123.

Hi all.

I'm trying to achieve a familynet/sneakernet grid.

As I'm learning more and more about tahoe, I'm still trying to improve some issues in order to achieve my goals. I've created this ticket to keep track of those issues and relevant tickets. Later, as I get advice/comments/suggestions I will spawn more detailed tickets as necessary, keeping them also tracked here.

Use case

Family grid for reciprocally storing each members personal files (mostly photos). I will be the sole admin of the grid because other grid members have no skills to manage the grid.

I created a root: dir as the grid admin where I create each users' home dir as a subdirectory of root. The users will store their pictures' backups in their home dirs via "tahoe backup". So, just keeping root dir healthy across all members will do in order to achieve safety of those backups.

Set up a introducer on public inet and a local storage node in each of the grid members. An important point to note here is that most of the time, when users will do their backups, their local node will be the only present node on the grid. So I lowered "shares.happy" to 1. The rest are as default 3/7/10. Thus the 'sneakernet' grid name.

I'm doing the replication work manually when 2 nodes do rendez-vous and that's the only time when they will have direct (ie LAN) connection.
For example, my brother has a node stored in a external USB drive and he brings it with him to my home. My computer is a desktop one, but my dad's is a laptop, and so on.

They will rendez-vous their "home" nodes carrying their latest backups to another member's node and repair, thus getting also those member backups replicated in the exchange, too.

Issues

Here are some issues and how I'm addressing them:

  1. Storage use: I don't want any node to store a full set of shares since it doesn't add security to the grid and it is a waste. I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node. Now I'm acomplishing this by fully repairing the grid on that node isolated and later pruning storage shares down to the desired count with a script (it's dirty, but works). Thinking about it, I've come to the conclusion that a 'hold-no-more-than-Z-shares' kind of setting for storage nodes will help me a lot (proposed implementation at #2124). Ticket #711 would also be useful. Also #1340 and comment on #1212
  2. Repairing: Related to the above, I have to always ensure that no repair will end with all shares on the same node. So before doing a repair between 2 nodes I ensure that each isolated node is 100% repaired (10/10) and all files healthy. Then I 'prune' the storage shares to 5 and now is when I can do a 2 node verify/repair. I know this is very inefficient, so any advice on how to improve this is welcome.
  3. Verification: I would like to place in each node crontab to do a deep-check-verify of the root verifycap and currently I can't because of #568. So I keep an eye on it.
  4. Verification: In my usage scenario, a healthy file will be any one just readable in the local node or somewhat configurable. Related issues: #614, #1212.
  5. Verification caps: I also planned to ease the verification/repair process via the WUI by linking the root verifycap into each user's home dir. But the WUI gives me an error when attempting to do it. I plan to use this also for establishing a "reciprocity list" for each user. I mean, if I grow the grid to outsiders, and I don't want them to hold some users home dirs, a "verifycaps" folder with the desired users home's verifycaps will do. In both members and outsiders cases, they just have to deep-check-repair their home's verifycaps-dir.
  6. Helper: Another idea I've come to is having a helper node that could "spool" the shares until they were pushed to at least X different nodes or until configurable expiration. Since the helper would be accessible by everyone, that would mitigate the isolation effect when doing backups. This can be useful for more use cases, IMO.

I've also read a lot of tickets with rebalancing issues and server distribution, but I doubt they'll fit to my use case. And since I'm not a Python programmer, I think bite-sized and simpler issues will allow me to help test improvements and suggestions and get to a usable state soon.

I'll keep adding issues as they come up. I know I'm trying to address too much issues in one single ticket, but I'm doing it to keep them organized in a single place. I expect to get some starting tips or advice on improving my use case and will gladly open new tickets as needed to get into the details, referencing this ticket.
Later, this issue can be used as a base documentation for those trying to achieve the same scenario.

Thanks in advance.

See also related issues: #793, #1107, #2123. Hi all. I'm trying to achieve a familynet/sneakernet grid. As I'm learning more and more about tahoe, I'm still trying to improve some issues in order to achieve my goals. I've created this ticket to keep track of those issues and relevant tickets. Later, as I get advice/comments/suggestions I will spawn more detailed tickets as necessary, keeping them also tracked here. ## Use case Family grid for reciprocally storing each members personal files (mostly photos). I will be the sole admin of the grid because other grid members have no skills to manage the grid. I created a root: dir as the grid admin where I create each users' home dir as a subdirectory of root. The users will store their pictures' backups in their home dirs via "tahoe backup". So, just keeping root dir healthy across all members will do in order to achieve safety of those backups. Set up a introducer on public inet and a local storage node in each of the grid members. An important point to note here is that most of the time, when users will do their backups, their local node will be the only present node on the grid. So I lowered "shares.happy" to 1. The rest are as default 3/7/10. Thus the 'sneakernet' grid name. I'm doing the replication work manually when 2 nodes do rendez-vous and that's the only time when they will have direct (ie LAN) connection. For example, my brother has a node stored in a external USB drive and he brings it with him to my home. My computer is a desktop one, but my dad's is a laptop, and so on. They will rendez-vous their "home" nodes carrying their latest backups to another member's node and repair, thus getting also those member backups replicated in the exchange, too. ## Issues Here are some issues and how I'm addressing them: 1. Storage use: I don't want any node to store a full set of shares since it doesn't add security to the grid and it is a waste. I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node. Now I'm acomplishing this by fully repairing the grid on that node isolated and later pruning storage shares down to the desired count with a script (it's dirty, but works). Thinking about it, I've come to the conclusion that a 'hold-no-more-than-Z-shares' kind of setting for storage nodes will help me a lot (proposed implementation at #2124). Ticket #711 would also be useful. Also #1340 and [comment on #1212](https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1212#comment:35) 2. Repairing: Related to the above, I have to always ensure that no repair will end with all shares on the same node. So before doing a repair between 2 nodes I ensure that each isolated node is 100% repaired (10/10) and all files healthy. Then I 'prune' the storage shares to 5 and now is when I can do a 2 node verify/repair. I know this is very inefficient, so any advice on how to improve this is welcome. 3. Verification: I would like to place in each node crontab to do a deep-check-verify of the root verifycap and currently I can't because of #568. So I keep an eye on it. 4. Verification: In my usage scenario, a healthy file will be any one just readable in the local node or somewhat configurable. Related issues: #614, #1212. 5. Verification caps: I also planned to ease the verification/repair process via the WUI by linking the root verifycap into each user's home dir. But the WUI gives me an error when attempting to do it. I plan to use this also for establishing a "reciprocity list" for each user. I mean, if I grow the grid to outsiders, and I don't want them to hold some users home dirs, a "verifycaps" folder with the desired users home's verifycaps will do. In both members and outsiders cases, they just have to deep-check-repair their home's verifycaps-dir. 6. Helper: Another idea I've come to is having a helper node that could "spool" the shares until they were pushed to at least X different nodes or until configurable expiration. Since the helper would be accessible by everyone, that would mitigate the isolation effect when doing backups. This can be useful for more use cases, IMO. I've also read a lot of tickets with rebalancing issues and server distribution, but I doubt they'll fit to my use case. And since I'm not a Python programmer, I think bite-sized and simpler issues will allow me to help test improvements and suggestions and get to a usable state soon. I'll keep adding issues as they come up. I know I'm trying to address too much issues in one single ticket, but I'm doing it to keep them organized in a single place. I expect to get some starting tips or advice on improving my use case and will gladly open new tickets as needed to get into the details, referencing this ticket. Later, this issue can be used as a base documentation for those trying to achieve the same scenario. Thanks in advance.
tahoe-lafs added the
unknown
major
enhancement
1.9.0
labels 2012-01-16 01:03:18 +00:00
tahoe-lafs added this to the undecided milestone 2012-01-16 01:03:18 +00:00
zooko commented 2012-01-16 06:32:49 +00:00
Author
Owner

Hi, amontero! Welcome. Thanks for the good ticket. Please see also https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection if you haven't already, and add a link from there back to this ticket. Thank you!

Hi, amontero! Welcome. Thanks for the good ticket. Please see also <https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection> if you haven't already, and add a link from there back to this ticket. Thank you!
amontero commented 2012-01-16 19:35:46 +00:00
Author
Owner

Replying to zooko:

Hi, amontero! Welcome. Thanks for the good ticket. Please see also https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection if you haven't already, and add a link from there back to this ticket. Thank you!

Hi Zooko,
Thanks for the prompt response. Linked this ticket in ServerSelection and also in the UseCases page, too.
I will wait some time for input on how I can improve my scenario. If there is no much room for improving, then I will spawn tickets as necessary for documenting needed features.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1657#issuecomment-128623): > Hi, amontero! Welcome. Thanks for the good ticket. Please see also <https://tahoe-lafs.org/trac/tahoe-lafs/wiki/ServerSelection> if you haven't already, and add a link from there back to this ticket. Thank you! Hi Zooko, Thanks for the prompt response. Linked this ticket in [ServerSelection](wiki/ServerSelection) and also in the [UseCases](wiki/UseCases) page, too. I will wait some time for input on how I can improve my scenario. If there is no much room for improving, then I will spawn tickets as necessary for documenting needed features.
davidsarah commented 2012-01-16 22:57:40 +00:00
Author
Owner

I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node.

How does this improve on replication (i.e. k = 1)? Replication is simpler.

> I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node. How does this improve on replication (i.e. k = 1)? Replication is simpler.
tahoe-lafs added
1.8.3
and removed
1.9.0
labels 2012-01-20 20:53:41 +00:00
amontero commented 2012-01-20 21:01:55 +00:00
Author
Owner

Replying to davidsarah:

I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node.

How does this improve on replication (i.e. k = 1)? Replication is simpler.

Hi [...]

As most of time the nodes will be isolated, the default N/happy/k of 3/7/10 doesn't works, so by now I'm using 3/1/10. Using 3/7 just because it is the recommended setting in the docs and I think (maybe I'm wrong) they're a fine enough settings for my goals.

Do you mean setting 1/1/10? That would be a worst case of space waste as I understand. I made the test, just to be sure, and it is a tenfold increase in space requirements when uploading. The grid members have to do their backups isolated and replicate to other nodes when they rendez-vous.

Or do you mean some other values for N/k? You made me think about it, and since my goal seems a replication (with LAFS privacy) setup it may work. But I can see no difference apart from the expansion factor in let's say 1/1/2. I'm aware that, as shares would be named either 0 or 1, that would make it somewhat simpler to script management, but the annoyances when repairing would stand.

A downside I can think of with less shares is losing the ability to hold an "extra share more than needed" at a reasonable space cost. I would like to keep it, just to be covered against "bit flippings" since I'll be using USB external drives as nodes also. However, that is secondary and I can give up that ability if thus makes things easier.

I've set up a testbed with 1/1/2 and as long as !#1 and !#2 it makes little difference when managing the grid nodes, so I would like to ask you help me understand better what N/k parameters do you mean.

Thanks.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/1657#issuecomment-128625): > > I want each member to hold x+1 shares, where x is enough for the file to be readable from that single node. > > How does this improve on replication (i.e. k = 1)? Replication is simpler. Hi [...] As most of time the nodes will be isolated, the default N/happy/k of 3/7/10 doesn't works, so by now I'm using 3/1/10. Using 3/7 just because it is the recommended setting in the docs and I think (maybe I'm wrong) they're a fine enough settings for my goals. Do you mean setting 1/1/10? That would be a worst case of space waste as I understand. I made the test, just to be sure, and it is a tenfold increase in space requirements when uploading. The grid members have to do their backups isolated and replicate to other nodes when they rendez-vous. Or do you mean some other values for N/k? You made me think about it, and since my goal seems a replication (with LAFS privacy) setup it may work. But I can see no difference apart from the expansion factor in let's say 1/1/2. I'm aware that, as shares would be named either 0 or 1, that would make it somewhat simpler to script management, but the annoyances when repairing would stand. A downside I can think of with less shares is losing the ability to hold an "extra share more than needed" at a reasonable space cost. I would like to keep it, just to be covered against "bit flippings" since I'll be using USB external drives as nodes also. However, that is secondary and I can give up that ability if thus makes things easier. I've set up a testbed with 1/1/2 and as long as !#1 and !#2 it makes little difference when managing the grid nodes, so I would like to ask you help me understand better what N/k parameters do you mean. Thanks.
davidsarah commented 2012-03-12 19:24:02 +00:00
Author
Owner

I meant, how is (k, (k+1)*H, (k+1)*Z) for k > 1 and (k+1) shares on each node, better than (1, H, Z)? They have approximately the same reliability, but (1, H, Z) is simpler and has a slightly lower expansion factor. The extra share on each node doesn't make much difference to reliability because bit flips are not much more likely than whole node failures.

I meant, how is (k, (k+1)*H, (k+1)*Z) for k > 1 and (k+1) shares on each node, better than (1, H, Z)? They have approximately the same reliability, but (1, H, Z) is simpler and has a slightly lower expansion factor. The extra share on each node doesn't make much difference to reliability because bit flips are not much more likely than whole node failures.
tahoe-lafs added
normal
and removed
major
labels 2012-03-29 19:12:37 +00:00
alsuren commented 2012-08-07 17:50:14 +00:00
Author
Owner

Sorry for the "me to" bug comment noise, but I can't find a way to subscribe to changes on the ticket.

The low upload bandwidth problem is one that becomes painfully obvious when bootstrapping any cloud-based backup/sharing service (like google drive and dropbox) with more than 10GB of data.

http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own is an attempt to solve a similar use-case.

Sorry for the "me to" bug comment noise, but I can't find a way to subscribe to changes on the ticket. The low upload bandwidth problem is one that becomes painfully obvious when bootstrapping any cloud-based backup/sharing service (like google drive and dropbox) with more than 10GB of data. <http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own> is an attempt to solve a similar use-case.
amontero commented 2013-11-30 20:15:55 +00:00
Author
Owner

I've opened #2123, which would help get closer to address items 1, 2 and 4.

I've opened #2123, which would help get closer to address items 1, 2 and 4.
amontero commented 2013-12-12 15:17:21 +00:00
Author
Owner

Add links to related tickets #793, #1107 in summary.

Add links to related tickets #793, #1107 in summary.
tahoe-lafs added
code-network
and removed
unknown
labels 2014-09-11 22:33:07 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1657
No description provided.