blacklist support #1425

Closed
opened 2011-06-27 17:43:59 +00:00 by warner · 28 comments

For various reasons, webapi gateway operators might want to have the ability to deny access to specific files. Putting this directly in tahoe, rather than obligating these operators to run a frontend proxy (like apache or nginx or something), will make it easier for everyone to use.

The attached patch creates a blacklist file, with a list of storage-index strings and a reason for each. Any webapi operation (indeed any operation, so FTP/SFTP too) that tries to access a node with one of the given SIs will throw an exception that contains the reason. The webapi frontend translates this exception into an HTTP "403 Forbidden" response.

For various reasons, webapi gateway operators might want to have the ability to deny access to specific files. Putting this directly in tahoe, rather than obligating these operators to run a frontend proxy (like apache or nginx or something), will make it easier for everyone to use. The attached patch creates a blacklist file, with a list of storage-index strings and a reason for each. Any webapi operation (indeed *any* operation, so FTP/SFTP too) that tries to access a node with one of the given SIs will throw an exception that contains the reason. The webapi frontend translates this exception into an HTTP "403 Forbidden" response.
warner added the
c/code-frontend-web
p/major
t/enhancement
v/1.8.2
labels 2011-06-27 17:43:59 +00:00
warner added this to the 1.9.0 milestone 2011-06-27 17:43:59 +00:00
Author

It turned out that the easiest implementation puts the blacklist right in IClient.create_node_from_uri, which means it's not limited to the webapi: FTP/SFTP (and any internal operations) will honor the blacklist too. So I'm looking for a better name for the config file than $NODEDIR/webapi.blacklist. I think just "blacklist" is a bit short, but could be convinced otherwise. Thoughts?

It turned out that the easiest implementation puts the blacklist right in `IClient.create_node_from_uri`, which means it's not limited to the webapi: FTP/SFTP (and any internal operations) will honor the blacklist too. So I'm looking for a better name for the config file than `$NODEDIR/webapi.blacklist`. I think just "blacklist" is a bit short, but could be convinced otherwise. Thoughts?
zooko was assigned by warner 2011-06-27 17:47:01 +00:00
zooko was unassigned by daira 2011-08-01 02:24:08 +00:00
daira self-assigned this 2011-08-01 02:24:08 +00:00
Author

Attachment blacklist.diff (19858 bytes) added

updated patch: block directories too, name 'access.blacklist'

**Attachment** blacklist.diff (19858 bytes) added updated patch: block directories too, name 'access.blacklist'
Author

The new patch moves the blacklist check a bit deeper (into NodeMaker), so it correctly blocks directories too. I renamed the file to access.blacklist since it's not webapi-specific, and updated the docs to match both (and added tests to make sure that directories and their children can be blocked).

The new patch moves the blacklist check a bit deeper (into NodeMaker), so it correctly blocks directories too. I renamed the file to `access.blacklist` since it's not webapi-specific, and updated the docs to match both (and added tests to make sure that directories and their children can be blocked).

In webapi.rst, 'webapi' -> 'web-API'.

The time to do an os.stat on a file that does not exist and catch the resulting OSError, seems to be around 0.005 ms on my machine. Are we sure that it isn't premature optimization to avoid this check on each web-API request if it didn't exist at start-up?

(For some reason it is faster to catch the exception from os.stat than to use os.path.exists. A significant part of the difference is just the time to do the .path and .exists accesses, which shows a) how unoptimized CPython is, and b) that we shouldn't worry too much about OS calls being expensive relative to Python code.)

If the access.blacklist file is deleted while a node is running, subsequent requests will fail.

It seems more usable to allow the reason to contain spaces. I know this complicates extending the format to have additional fields, but we probably won't need that, and if we did then we could use something like

storage_index;new_field The men in black came round to my house.

The line wrapping changes to create_node_from_uri are not necessary; I don't object to them but they probably shouldn't be in this patch.

If a request happens at the same time as the access.blacklist file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites access.blacklist could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.)

When an access is blocked, the SI of the file should probably be logged. (I think the failed request will already be logged, but the exception only gives the reason string, which isn't necessarily unique to a file.)

In webapi.rst, 'webapi' -> 'web-API'. The time to do an `os.stat` on a file that does not exist and catch the resulting `OSError`, seems to be around 0.005 ms on my machine. Are we sure that it isn't premature optimization to avoid this check on each web-API request if it didn't exist at start-up? (For some reason it is faster to catch the exception from `os.stat` than to use `os.path.exists`. A significant part of the difference is just the time to do the `.path` and `.exists` accesses, which shows a) how unoptimized CPython is, and b) that we shouldn't worry too much about OS calls being expensive relative to Python code.) If the `access.blacklist` file is deleted while a node is running, subsequent requests will fail. It seems more usable to allow the reason to contain spaces. I know this complicates extending the format to have additional fields, but we probably won't need that, and if we did then we could use something like ``` storage_index;new_field The men in black came round to my house. ``` The line wrapping changes to `create_node_from_uri` are not necessary; I don't object to them but they probably shouldn't be in this patch. If a request happens at the same time as the `access.blacklist` file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites `access.blacklist` could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.) When an access is blocked, the SI of the file should probably be logged. (I think the failed request will already be logged, but the exception only gives the reason string, which isn't necessarily unique to a file.)
daira removed their assignment 2011-08-04 00:31:55 +00:00
warner was assigned by daira 2011-08-04 00:31:55 +00:00

If a request happens at the same time as the access.blacklist file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites access.blacklist could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.)

If you use FilePath.setContent then your code can be simple while having this behavior. In addition, on Windows it will write to a new file, delete the old file, move the new file into place. That should, if I understand correctly, guarantee that an uncoordinated reader will see one of (old file, new file, no file).

If a request happens at the same time as the access.blacklist file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites access.blacklist could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.) If you use [FilePath.setContent](http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html#setContent) then your code can be simple while having this behavior. In addition, on Windows it will write to a new file, delete the old file, move the new file into place. That should, if I understand correctly, guarantee that an uncoordinated reader will see one of (old file, new file, no file).

Replying to zooko:

If a request happens at the same time as the access.blacklist file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites access.blacklist could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.)

If you use FilePath.setContent then your code can be simple while having this behavior. In addition, on Windows it will write to a new file, delete the old file, move the new file into place. That should, if I understand correctly, guarantee that an uncoordinated reader will see one of (old file, new file, no file).

Currently if the node sees no file (after seeing one at startup), the blacklist code will raise an exception. (I think this will just fail the request rather than crashing the node.) I suppose that fails safe, but it's still undesirable.

In any case, someone editing the file probably won't be using FilePath. They might use an editor that does the atomic rename-into-place; it's not uncommon for editors to do that on Unix, but it's not good for usability to expect users to know whether their editor does that.

If we had a tahoe command to add and remove entries from the blacklist, that command could tell the node when to reread it, maybe (or the add/remove operations could be web-API requests, but that would be more complicated).

Alternatively, we could just document that the node always needs to be restarted after editing the blacklist.

Replying to [zooko](/tahoe-lafs/trac/issues/1425#issuecomment-385091): > If a request happens at the same time as the access.blacklist file is being rewritten, the request may reread an incomplete copy of the file. (The process that rewrites access.blacklist could avoid this by writing the new file and then moving it into place, but the patch doesn't document that this is necessary, and also it won't work on Windows.) > > If you use [FilePath.setContent](http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html#setContent) then your code can be simple while having this behavior. In addition, on Windows it will write to a new file, delete the old file, move the new file into place. That should, if I understand correctly, guarantee that an uncoordinated reader will see one of (old file, new file, no file). Currently if the node sees no file (after seeing one at startup), the blacklist code will raise an exception. (I think this will just fail the request rather than crashing the node.) I suppose that fails safe, but it's still undesirable. In any case, someone editing the file probably won't be using FilePath. They might use an editor that does the atomic rename-into-place; it's not uncommon for editors to do that on Unix, but it's not good for usability to expect users to know whether their editor does that. If we had a `tahoe` command to add and remove entries from the blacklist, that command could tell the node when to reread it, maybe (or the add/remove operations could be web-API requests, but that would be more complicated). Alternatively, we could just document that the node always needs to be restarted after editing the blacklist.

Replying to [davidsarah]comment:7:

Alternatively, we could just document that the node always needs to be restarted after editing the blacklist.

How about doing that for 1.9, and then adding

tahoe blacklist add STORAGE_INDEX_OR_PATH
tahoe blacklist remove STORAGE_INDEX_OR_PATH
tahoe blacklist view

or similar commands (that would work with a running gateway) for 1.10?

Replying to [davidsarah]comment:7: > Alternatively, we could just document that the node always needs to be restarted after editing the blacklist. How about doing that for 1.9, and then adding ``` tahoe blacklist add STORAGE_INDEX_OR_PATH tahoe blacklist remove STORAGE_INDEX_OR_PATH tahoe blacklist view ``` or similar commands (that would work with a running gateway) for 1.10?
Author

changes I'm making following the phone call from last weekend:

  • always check the blacklist file, not just if it was present at
    startup (prevent crash if file is deleted, removes need to reboot
    after adding first entry)
  • mention in the docs that you need to replace the file atomically
  • log SI of blocked file
  • consider changing format of blacklist file to allow spaces in reason
changes I'm making following the phone call from last weekend: * always check the blacklist file, not just if it was present at startup (prevent crash if file is deleted, removes need to reboot after adding first entry) * mention in the docs that you need to replace the file atomically * log SI of blocked file * consider changing format of blacklist file to allow spaces in reason
Author

Attachment blacklist2.diff (20291 bytes) added

updated patch

**Attachment** blacklist2.diff (20291 bytes) added updated patch
Author

That new patch adds the following:

 * allow comments, spaces, blank lines in access.blacklist
 * always check, not just when access.blacklist was present at boot
 * log prohibited file access (to twistd.log)
 * report blacklist parsing errors
 * mention need for atomic blacklist updates in docs

While testing, I noticed that doing the check in NodeMaker means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win.

That new patch adds the following: * allow comments, spaces, blank lines in access.blacklist * always check, not just when access.blacklist was present at boot * log prohibited file access (to twistd.log) * report blacklist parsing errors * mention need for atomic blacklist updates in docs While testing, I noticed that doing the check in `NodeMaker` means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win.
warner removed their assignment 2011-08-21 01:10:30 +00:00
daira was assigned by warner 2011-08-21 01:10:30 +00:00

Replying to warner:

While testing, I noticed that doing the check in NodeMaker means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win.

Yes, I think we should fix this (and have a test for it).

Would moving the check to Filenode.read be sufficient? I think you need checks in MutableFileVersion as well (for both reading and writing a blacklisted mutable object). It seems more fragile than doing the check in NodeMaker; there are more places to check, and missing one would leave some blacklisted objects accessible.

Maybe it's safer to keep the check in NodeMaker, but have the directory-reading code catch the exception and omit the blacklisted Filenode. Oh, but that would mean that modifying a directory would drop all blacklisted children, which is probably not what we want (the fact that the object is blacklisted for this gateway doesn't mean that it shouldn't be preserved for access via other gateways).

I don't know; I'll leave it up to you.

Replying to [warner](/tahoe-lafs/trac/issues/1425#issuecomment-385095): > While testing, I noticed that doing the check in `NodeMaker` means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win. Yes, I think we should fix this (and have a test for it). Would moving the check to Filenode.read be sufficient? I think you need checks in MutableFileVersion as well (for both reading and writing a blacklisted mutable object). It seems more fragile than doing the check in NodeMaker; there are more places to check, and missing one would leave some blacklisted objects accessible. Maybe it's safer to keep the check in NodeMaker, but have the directory-reading code catch the exception and omit the blacklisted Filenode. Oh, but that would mean that modifying a directory would drop all blacklisted children, which is probably not what we want (the fact that the object is blacklisted for *this* gateway doesn't mean that it shouldn't be preserved for access via other gateways). I don't know; I'll leave it up to you.

In the example in webapi.rst,

-> error, 403 Access Prohibited: my-puppy-told-me-to

should be

-> error, 403 Access Prohibited: my puppy told me to

(Still reviewing.)

In the example in webapi.rst, ``` -> error, 403 Access Prohibited: my-puppy-told-me-to ``` should be ``` -> error, 403 Access Prohibited: my puppy told me to ``` (Still reviewing.)

webapi.rst line 1964: "whether it need to be" -> "whether it needs to be"

FileProhibited in src/allmydata/blacklist.py: __init__(self, reason) should either chain to the superclass Exception.__init__(self, reason) or implement __repr__, otherwise the reason won't be included in tracebacks.

Expand blacklist_fn to blacklist_filename (I was momentarily confused because I read fn as function.)

webapi.rst line 1964: "whether it need to be" -> "whether it needs to be" `FileProhibited` in src/allmydata/blacklist.py: `__init__(self, reason)` should either chain to the superclass `Exception.__init__(self, reason)` or implement `__repr__`, otherwise the reason won't be included in tracebacks. Expand `blacklist_fn` to `blacklist_filename` (I was momentarily confused because I read fn as function.)
    try: 
        current_mtime = os.stat(self.blacklist_fn).st_mtime
    except EnvironmentError: 
        # unreadable blacklist file means no blacklist 
        self.entries.clear() 
        return 

If the file exists but isn't readable (for instance, if we don't have permission to read it), that should not be a silent error. Make it either:

    except EnvironmentError:
        if os.path.exists(self.blacklist_fn):
            raise
        # nonexistent blacklist file means no blacklist 
        self.entries.clear() 
        return 

or

    except EnvironmentError, e:
        # unreadable blacklist file means no blacklist 
        if os.path.exists(self.blacklist_fn):
            twisted_log.err(e, "unreadable blacklist file")
        self.entries.clear() 
        return 
``` try: current_mtime = os.stat(self.blacklist_fn).st_mtime except EnvironmentError: # unreadable blacklist file means no blacklist self.entries.clear() return ``` If the file exists but isn't readable (for instance, if we don't have permission to read it), that should not be a silent error. Make it either: ``` except EnvironmentError: if os.path.exists(self.blacklist_fn): raise # nonexistent blacklist file means no blacklist self.entries.clear() return ``` or ``` except EnvironmentError, e: # unreadable blacklist file means no blacklist if os.path.exists(self.blacklist_fn): twisted_log.err(e, "unreadable blacklist file") self.entries.clear() return ```

The scope of the try/except at line 26 of blacklist.py can be narrowed a bit; I'd put it around the for loop rather than the if.

This comment still applies:

The line wrapping changes to create_node_from_uri are not necessary; I don't object to them but they probably shouldn't be in this patch.

In allmydata/test/no_network.py, the line c.set_default_mutable_keysize(522) will conflict with [5171/ticket393-MDMF-2].

In test_blacklist:

self.g.clients[0].blacklist.last_mtime -= 2.0

Ugh, but I can't think of a cleaner way to do this, so I'll let you off ;-)

Good tests. Do we need a test that mutable files are correctly blacklisted? I suppose that's not necessary if we are checking in NodeMaker, but it would be if we were checking in immutable/filenode.py and mutable/filenode.py.

Otherwise +1.

The scope of the try/except at line 26 of blacklist.py can be narrowed a bit; I'd put it around the `for` loop rather than the `if`. This comment still applies: > The line wrapping changes to `create_node_from_uri` are not necessary; I don't object to them but they probably shouldn't be in this patch. In allmydata/test/no_network.py, the line `c.set_default_mutable_keysize(522)` will conflict with [5171/ticket393-MDMF-2]. In `test_blacklist`: ``` self.g.clients[0].blacklist.last_mtime -= 2.0 ``` Ugh, but I can't think of a cleaner way to do this, so I'll let you off ;-) Good tests. Do we need a test that mutable files are correctly blacklisted? I suppose that's not necessary if we are checking in `NodeMaker`, but it would be if we were checking in immutable/filenode.py and mutable/filenode.py. Otherwise +1.
daira removed their assignment 2011-08-21 05:32:25 +00:00
warner was assigned by daira 2011-08-21 05:32:25 +00:00

Replying to [davidsarah]comment:12:

Replying to warner:

While testing, I noticed that doing the check in NodeMaker means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win.

Yes, I think we should fix this (and have a test for it).

... but if it would delay 1.9, then let's just do it the simpler way that fails for directory listing, document that, and open a ticket to fix it.

Replying to [davidsarah]comment:12: > Replying to [warner](/tahoe-lafs/trac/issues/1425#issuecomment-385095): > > While testing, I noticed that doing the check in `NodeMaker` means that listing a directory will fail if one of the objects inside it is prohibited: the directory-listing code creates a Filenode for each object (to extract the readcaps/writecaps separately, I think), and that step fails. Should we fix that? By doing the check somewhere else (maybe Filenode.read?) we could allow directories to mention the prohibited item, which might be a usability win. > > Yes, I think we should fix this (and have a test for it). ... but if it would delay 1.9, then let's just do it the simpler way that fails for directory listing, document that, and open a ticket to fix it.

Attachment 1425-davidsarah.darcs.patch (62252 bytes) added

Tests, implementation and docs for blacklists, equivalent to blacklist2.diff but rebased for trunk, and with an extra test that we can list a directory containing a blacklisted file. refs #1425nt to blacklist2.diff but rebased for trunk. refs #1425

**Attachment** 1425-davidsarah.darcs.patch (62252 bytes) added Tests, implementation and docs for blacklists, equivalent to blacklist2.diff but rebased for trunk, and with an extra test that we can list a directory containing a blacklisted file. refs #1425nt to blacklist2.diff but rebased for trunk. refs #1425
Author

Attachment blacklist3.diff (21114 bytes) added

updated: handle directories properly, allow listing, incorporate recommendations

**Attachment** blacklist3.diff (21114 bytes) added updated: handle directories properly, allow listing, incorporate recommendations
Author

hrm, one wrinkle that's somehow bypassing tests meant to catch it: if you blacklist a file, access it (and get the error), then unblacklist it, the next access still throws an error. An obvious downside of monkeypatching to replace Node.read is that the Node might stick around: in this case in the nodemaker's cache (although that's a WeakValueDictionary so there must be something else holding on to it).

The safest approach (at least one that would let you unblacklist things quickly) would be to do the blacklist check on each call to read(), rather than replacing read() with a function that always throws an exception.

hrm, one wrinkle that's somehow bypassing tests meant to catch it: if you blacklist a file, access it (and get the error), then unblacklist it, the next access still throws an error. An obvious downside of monkeypatching to replace `Node.read` is that the Node might stick around: in this case in the nodemaker's cache (although that's a `WeakValueDictionary` so there must be something else holding on to it). The safest approach (at least one that would let you unblacklist things quickly) would be to do the blacklist check on each call to read(), rather than replacing read() with a function that always throws an exception.

Attachment blacklist4.darcs.patch (69621 bytes) added

Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. fixes #1425

**Attachment** blacklist4.darcs.patch (69621 bytes) added Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. fixes #1425

Hmm, the changes to no_network.py in attachment:blacklist4.darcs.patch aren't actually necessary, because the test no longer needs to restart the node. I'll revert those changes.

Hmm, the changes to `no_network.py` in [attachment:blacklist4.darcs.patch](/tahoe-lafs/trac/attachments/000078ac-c469-08d2-d41f-823262888329) aren't actually necessary, because the test no longer needs to restart the node. I'll revert those changes.

Replying to warner:

hrm, one wrinkle that's somehow bypassing tests meant to catch it: if you blacklist a file, access it (and get the error), then unblacklist it, the next access still throws an error. An obvious downside of monkeypatching to replace Node.read is that the Node might stick around: in this case in the nodemaker's cache (although that's a WeakValueDictionary so there must be something else holding on to it).

The safest approach (at least one that would let you unblacklist things quickly) would be to do the blacklist check on each call to read(), rather than replacing read() with a function that always throws an exception.

attachment:blacklist4.darcs.patch solves this problem because the nodemaker will cache the original node object, not the ProhibitedNode wrapper. A node will get wrapped with ProhibitedNode on each request depending on whether or not it is blacklisted at that request.

A slightly odd side-effect of this patch is that prohibited directories will be treated a little more like files. This is actually quite useful because it prevents recursive operations from trying to traverse them.

Replying to [warner](/tahoe-lafs/trac/issues/1425#issuecomment-385106): > hrm, one wrinkle that's somehow bypassing tests meant to catch it: if you blacklist a file, access it (and get the error), then unblacklist it, the next access still throws an error. An obvious downside of monkeypatching to replace `Node.read` is that the Node might stick around: in this case in the nodemaker's cache (although that's a `WeakValueDictionary` so there must be something else holding on to it). > > The safest approach (at least one that would let you unblacklist things quickly) would be to do the blacklist check on each call to read(), rather than replacing read() with a function that always throws an exception. [attachment:blacklist4.darcs.patch](/tahoe-lafs/trac/attachments/000078ac-c469-08d2-d41f-823262888329) solves this problem because the nodemaker will cache the original node object, not the ProhibitedNode wrapper. A node will get wrapped with ProhibitedNode on each request depending on whether or not it is blacklisted at that request. A slightly odd side-effect of this patch is that prohibited directories will be treated a little more like files. This is actually quite useful because it prevents recursive operations from trying to traverse them.
Author

attachment:blacklist4.darcs.patch : the blacklist.py file is missing,
including the new ProhibitedNode class, so I can't test it right
away. I assume the new class passes through a lot of methods but raises
an exception during read() and download_version (and
download_best_version).

Let's move the no_network.py cleanups to a different patch:
they're useful cleanups, but I agree it'd be better to defer them until
after the release.

I'm willing to go with this approach, if only for expediency.. I've got
a few concerns that might warrant more work post-1.9:

  • we talked about this blocking FTP/SFTP too, and blocking read()

ought to do that, but it'd be nice to have a test for it.

  • marking the files as prohibited on the directory listing is a nice

touch, but I'd be happy to see this feature go in without it. Also, by
removing the More Info link, it's harder for users to find the blocked
filecap (and in particular the storage-index), so they can edit their
access.blacklist file to unblock the file later. Maybe we could
put "Access Prohibited" in the info column and make it a link to the
same old More Info page as before, then add the reason to the
more-info page? Likewise, I don't think we need to change the "type"
column to indicate the file has been blacklisted: the strikethrough is
plenty big enough.

[attachment:blacklist4.darcs.patch](/tahoe-lafs/trac/attachments/000078ac-c469-08d2-d41f-823262888329) : the blacklist.py file is missing, including the new `ProhibitedNode` class, so I can't test it right away. I assume the new class passes through a lot of methods but raises an exception during `read()` and `download_version` (and `download_best_version`). Let's move the `no_network.py` cleanups to a different patch: they're useful cleanups, but I agree it'd be better to defer them until after the release. I'm willing to go with this approach, if only for expediency.. I've got a few concerns that might warrant more work post-1.9: * we talked about this blocking FTP/SFTP too, and blocking `read()` > ought to do that, but it'd be nice to have a test for it. * marking the files as prohibited on the directory listing is a nice > touch, but I'd be happy to see this feature go in without it. Also, by > removing the More Info link, it's harder for users to find the blocked > filecap (and in particular the storage-index), so they can edit their `access.blacklist` file to unblock the file later. Maybe we could > put "Access Prohibited" in the info column and make it a link to the > same old More Info page as before, then add the reason to the > more-info page? Likewise, I don't think we need to change the "type" > column to indicate the file has been blacklisted: the strikethrough is > plenty big enough.

Attachment blacklist5.darcs.patch (71289 bytes) added

Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. Inclusion of blacklist.py fixed. fixes #1425

**Attachment** blacklist5.darcs.patch (71289 bytes) added Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. Inclusion of blacklist.py fixed. fixes #1425

I removed the no_network.py cleanups and added blacklist.py (sorry about that). The other changes can wait until after the alpha.

I removed the `no_network.py` cleanups and added `blacklist.py` (sorry about that). The other changes can wait until after the alpha.

Replying to warner:

  • marking the files as prohibited on the directory listing is a nice
    touch, but I'd be happy to see this feature go in without it. Also, by
    removing the More Info link, it's harder for users to find the blocked
    filecap (and in particular the storage-index), so they can edit their
    access.blacklist file to unblock the file later.

I added the reason in that column because the More Info page ended up giving
an error, and there was no point in having the link in that case. I'll have a
look at how to get the info page working and showing the blacklist reason.

Maybe we could put "Access Prohibited" in the info column and make it a
link to the same old More Info page as before, then add the reason to the
more-info page? Likewise, I don't think we need to change the "type"
column to indicate the file has been blacklisted: the strikethrough is
plenty big enough.

I agree. The change to the type column was there more because it was slightly
easier to implement than showing the original type code (and because I didn't
have the "Access Prohibited" at that point), than because it's necessary.

Replying to [warner](/tahoe-lafs/trac/issues/1425#issuecomment-385109): > * marking the files as prohibited on the directory listing is a nice > touch, but I'd be happy to see this feature go in without it. Also, by > removing the More Info link, it's harder for users to find the blocked > filecap (and in particular the storage-index), so they can edit their > `access.blacklist` file to unblock the file later. I added the reason in that column because the More Info page ended up giving an error, and there was no point in having the link in that case. I'll have a look at how to get the info page working and showing the blacklist reason. > Maybe we could put "Access Prohibited" in the info column and make it a > link to the same old More Info page as before, then add the reason to the > more-info page? Likewise, I don't think we need to change the "type" > column to indicate the file has been blacklisted: the strikethrough is > plenty big enough. I agree. The change to the type column was there more because it was slightly easier to implement than showing the original type code (and because I didn't have the "Access Prohibited" at that point), than because it's necessary.

In changeset:3d7a32647c431385:

Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. Inclusion of blacklist.py fixed. fixes #1425
In changeset:3d7a32647c431385: ``` Implementation, tests and docs for blacklists. This version allows listing directories containing a blacklisted child. Inclusion of blacklist.py fixed. fixes #1425 ```
daira added the
r/fixed
label 2011-08-25 02:26:32 +00:00
daira closed this issue 2011-08-25 02:26:32 +00:00

When releasing 1.9, we should go to extra effort to communicate what this change does and doesn't do. I've been learning that almost all users have very simpleminded models of things, for example I think the existence of the public web gateway made most users think that Tahoe-LAFS was nothing more than some sort of online service. Explaining this feature (in the NEWS/release-notes/etc.) may be a good opportunity to explain the difference between running your own software (Tahoe-LAFS storage client) to upload or download files vs. visiting a web server (Tahoe-LAFS gateway) operated by someone else and asking them to serve files to you.

This feature lets the operator of a Tahoe-LAFS storage client (== Tahoe-LAFS gateway == the web server in question) configure their software so it refuses to serve certain files (to anyone). It does not give them any ability to affect whether other Tahoe-LAFS storage clients/gateways access those files.

How can we make this clear? Maybe the only way to make this clear is to create a variant of http://tahoe-lafs.org/~zooko/network-and-reliance-topology.png which shows multiple gateways in use, and indicate on that diagram that the blacklisting feature affects only the single gateway that chooses to use it.

When releasing 1.9, we should go to extra effort to communicate what this change does and doesn't do. I've been learning that almost all users have very simpleminded models of things, for example I think the existence of the public web gateway made most users think that Tahoe-LAFS was nothing more than some sort of online service. Explaining this feature (in the NEWS/release-notes/etc.) may be a good opportunity to explain the difference between running your own software (Tahoe-LAFS storage client) to upload or download files vs. visiting a web server (Tahoe-LAFS gateway) operated by someone else and asking them to serve files to you. This feature lets the operator of a Tahoe-LAFS storage client (== Tahoe-LAFS gateway == the web server in question) configure their software so it refuses to serve certain files (to anyone). It does not give them any ability to affect whether *other* Tahoe-LAFS storage clients/gateways access those files. How can we make this clear? Maybe the only way to make this clear is to create a variant of <http://tahoe-lafs.org/~zooko/network-and-reliance-topology.png> which shows multiple gateways in use, and indicate on that diagram that the blacklisting feature affects only the single gateway that chooses to use it.
Sign in to join this conversation.
No labels
c/code
c/code-dirnodes
c/code-encoding
c/code-frontend
c/code-frontend-cli
c/code-frontend-ftp-sftp
c/code-frontend-magic-folder
c/code-frontend-web
c/code-mutable
c/code-network
c/code-nodeadmin
c/code-peerselection
c/code-storage
c/contrib
c/dev-infrastructure
c/docs
c/operational
c/packaging
c/unknown
c/website
kw:2pc
kw:410
kw:9p
kw:ActivePerl
kw:AttributeError
kw:DataUnavailable
kw:DeadReferenceError
kw:DoS
kw:FileZilla
kw:GetLastError
kw:IFinishableConsumer
kw:K
kw:LeastAuthority
kw:Makefile
kw:RIStorageServer
kw:StringIO
kw:UncoordinatedWriteError
kw:about
kw:access
kw:access-control
kw:accessibility
kw:accounting
kw:accounting-crawler
kw:add-only
kw:aes
kw:aesthetics
kw:alias
kw:aliases
kw:aliens
kw:allmydata
kw:amazon
kw:ambient
kw:annotations
kw:anonymity
kw:anonymous
kw:anti-censorship
kw:api_auth_token
kw:appearance
kw:appname
kw:apport
kw:archive
kw:archlinux
kw:argparse
kw:arm
kw:assertion
kw:attachment
kw:auth
kw:authentication
kw:automation
kw:avahi
kw:availability
kw:aws
kw:azure
kw:backend
kw:backoff
kw:backup
kw:backupdb
kw:backward-compatibility
kw:bandwidth
kw:basedir
kw:bayes
kw:bbfreeze
kw:beta
kw:binaries
kw:binutils
kw:bitcoin
kw:bitrot
kw:blacklist
kw:blocker
kw:blocks-cloud-deployment
kw:blocks-cloud-merge
kw:blocks-magic-folder-merge
kw:blocks-merge
kw:blocks-raic
kw:blocks-release
kw:blog
kw:bom
kw:bonjour
kw:branch
kw:branding
kw:breadcrumbs
kw:brians-opinion-needed
kw:browser
kw:bsd
kw:build
kw:build-helpers
kw:buildbot
kw:builders
kw:buildslave
kw:buildslaves
kw:cache
kw:cap
kw:capleak
kw:captcha
kw:cast
kw:centos
kw:cffi
kw:chacha
kw:charset
kw:check
kw:checker
kw:chroot
kw:ci
kw:clean
kw:cleanup
kw:cli
kw:cloud
kw:cloud-backend
kw:cmdline
kw:code
kw:code-checks
kw:coding-standards
kw:coding-tools
kw:coding_tools
kw:collection
kw:compatibility
kw:completion
kw:compression
kw:confidentiality
kw:config
kw:configuration
kw:configuration.txt
kw:conflict
kw:connection
kw:connectivity
kw:consistency
kw:content
kw:control
kw:control.furl
kw:convergence
kw:coordination
kw:copyright
kw:corruption
kw:cors
kw:cost
kw:coverage
kw:coveralls
kw:coveralls.io
kw:cpu-watcher
kw:cpyext
kw:crash
kw:crawler
kw:crawlers
kw:create-container
kw:cruft
kw:crypto
kw:cryptography
kw:cryptography-lib
kw:cryptopp
kw:csp
kw:curl
kw:cutoff-date
kw:cycle
kw:cygwin
kw:d3
kw:daemon
kw:darcs
kw:darcsver
kw:database
kw:dataloss
kw:db
kw:dead-code
kw:deb
kw:debian
kw:debug
kw:deep-check
kw:defaults
kw:deferred
kw:delete
kw:deletion
kw:denial-of-service
kw:dependency
kw:deployment
kw:deprecation
kw:desert-island
kw:desert-island-build
kw:design
kw:design-review-needed
kw:detection
kw:dev-infrastructure
kw:devpay
kw:directory
kw:directory-page
kw:dirnode
kw:dirnodes
kw:disconnect
kw:discovery
kw:disk
kw:disk-backend
kw:distribute
kw:distutils
kw:dns
kw:do_http
kw:doc-needed
kw:docker
kw:docs
kw:docs-needed
kw:dokan
kw:dos
kw:download
kw:downloader
kw:dragonfly
kw:drop-upload
kw:duplicity
kw:dusty
kw:earth-dragon
kw:easy
kw:ec2
kw:ecdsa
kw:ed25519
kw:egg-needed
kw:eggs
kw:eliot
kw:email
kw:empty
kw:encoding
kw:endpoint
kw:enterprise
kw:enum34
kw:environment
kw:erasure
kw:erasure-coding
kw:error
kw:escaping
kw:etag
kw:etch
kw:evangelism
kw:eventual
kw:example
kw:excess-authority
kw:exec
kw:exocet
kw:expiration
kw:extensibility
kw:extension
kw:failure
kw:fedora
kw:ffp
kw:fhs
kw:figleaf
kw:file
kw:file-descriptor
kw:filename
kw:filesystem
kw:fileutil
kw:fips
kw:firewall
kw:first
kw:floatingpoint
kw:flog
kw:foolscap
kw:forward-compatibility
kw:forward-secrecy
kw:forwarding
kw:free
kw:freebsd
kw:frontend
kw:fsevents
kw:ftp
kw:ftpd
kw:full
kw:furl
kw:fuse
kw:garbage
kw:garbage-collection
kw:gateway
kw:gatherer
kw:gc
kw:gcc
kw:gentoo
kw:get
kw:git
kw:git-annex
kw:github
kw:glacier
kw:globalcaps
kw:glossary
kw:google-cloud-storage
kw:google-drive-backend
kw:gossip
kw:governance
kw:grid
kw:grid-manager
kw:gridid
kw:gridsync
kw:grsec
kw:gsoc
kw:gvfs
kw:hackfest
kw:hacktahoe
kw:hang
kw:hardlink
kw:heartbleed
kw:heisenbug
kw:help
kw:helper
kw:hint
kw:hooks
kw:how
kw:how-to
kw:howto
kw:hp
kw:hp-cloud
kw:html
kw:http
kw:https
kw:i18n
kw:i2p
kw:i2p-collab
kw:illustration
kw:image
kw:immutable
kw:impressions
kw:incentives
kw:incident
kw:init
kw:inlineCallbacks
kw:inotify
kw:install
kw:installer
kw:integration
kw:integration-test
kw:integrity
kw:interactive
kw:interface
kw:interfaces
kw:interoperability
kw:interstellar-exploration
kw:introducer
kw:introduction
kw:iphone
kw:ipkg
kw:iputil
kw:ipv6
kw:irc
kw:jail
kw:javascript
kw:joke
kw:jquery
kw:json
kw:jsui
kw:junk
kw:key-value-store
kw:kfreebsd
kw:known-issue
kw:konqueror
kw:kpreid
kw:kvm
kw:l10n
kw:lae
kw:large
kw:latency
kw:leak
kw:leasedb
kw:leases
kw:libgmp
kw:license
kw:licenss
kw:linecount
kw:link
kw:linux
kw:lit
kw:localhost
kw:location
kw:locking
kw:logging
kw:logo
kw:loopback
kw:lucid
kw:mac
kw:macintosh
kw:magic-folder
kw:manhole
kw:manifest
kw:manual-test-needed
kw:map
kw:mapupdate
kw:max_space
kw:mdmf
kw:memcheck
kw:memory
kw:memory-leak
kw:mesh
kw:metadata
kw:meter
kw:migration
kw:mime
kw:mingw
kw:minimal
kw:misc
kw:miscapture
kw:mlp
kw:mock
kw:more-info-needed
kw:mountain-lion
kw:move
kw:multi-users
kw:multiple
kw:multiuser-gateway
kw:munin
kw:music
kw:mutability
kw:mutable
kw:mystery
kw:names
kw:naming
kw:nas
kw:navigation
kw:needs-review
kw:needs-spawn
kw:netbsd
kw:network
kw:nevow
kw:new-user
kw:newcaps
kw:news
kw:news-done
kw:news-needed
kw:newsletter
kw:newurls
kw:nfc
kw:nginx
kw:nixos
kw:no-clobber
kw:node
kw:node-url
kw:notification
kw:notifyOnDisconnect
kw:nsa310
kw:nsa320
kw:nsa325
kw:numpy
kw:objects
kw:old
kw:openbsd
kw:openitp-packaging
kw:openssl
kw:openstack
kw:opensuse
kw:operation-helpers
kw:operational
kw:operations
kw:ophandle
kw:ophandles
kw:ops
kw:optimization
kw:optional
kw:options
kw:organization
kw:os
kw:os.abort
kw:ostrom
kw:osx
kw:osxfuse
kw:otf-magic-folder-objective1
kw:otf-magic-folder-objective2
kw:otf-magic-folder-objective3
kw:otf-magic-folder-objective4
kw:otf-magic-folder-objective5
kw:otf-magic-folder-objective6
kw:p2p
kw:packaging
kw:partial
kw:password
kw:path
kw:paths
kw:pause
kw:peer-selection
kw:performance
kw:permalink
kw:permissions
kw:persistence
kw:phone
kw:pickle
kw:pip
kw:pipermail
kw:pkg_resources
kw:placement
kw:planning
kw:policy
kw:port
kw:portability
kw:portal
kw:posthook
kw:pratchett
kw:preformance
kw:preservation
kw:privacy
kw:process
kw:profile
kw:profiling
kw:progress
kw:proxy
kw:publish
kw:pyOpenSSL
kw:pyasn1
kw:pycparser
kw:pycrypto
kw:pycrypto-lib
kw:pycryptopp
kw:pyfilesystem
kw:pyflakes
kw:pylint
kw:pypi
kw:pypy
kw:pysqlite
kw:python
kw:python3
kw:pythonpath
kw:pyutil
kw:pywin32
kw:quickstart
kw:quiet
kw:quotas
kw:quoting
kw:raic
kw:rainhill
kw:random
kw:random-access
kw:range
kw:raspberry-pi
kw:reactor
kw:readonly
kw:rebalancing
kw:recovery
kw:recursive
kw:redhat
kw:redirect
kw:redressing
kw:refactor
kw:referer
kw:referrer
kw:regression
kw:rekey
kw:relay
kw:release
kw:release-blocker
kw:reliability
kw:relnotes
kw:remote
kw:removable
kw:removable-disk
kw:rename
kw:renew
kw:repair
kw:replace
kw:report
kw:repository
kw:research
kw:reserved_space
kw:response-needed
kw:response-time
kw:restore
kw:retrieve
kw:retry
kw:review
kw:review-needed
kw:reviewed
kw:revocation
kw:roadmap
kw:rollback
kw:rpm
kw:rsa
kw:rss
kw:rst
kw:rsync
kw:rusty
kw:s3
kw:s3-backend
kw:s3-frontend
kw:s4
kw:same-origin
kw:sandbox
kw:scalability
kw:scaling
kw:scheduling
kw:schema
kw:scheme
kw:scp
kw:scripts
kw:sdist
kw:sdmf
kw:security
kw:self-contained
kw:server
kw:servermap
kw:servers-of-happiness
kw:service
kw:setup
kw:setup.py
kw:setup_requires
kw:setuptools
kw:setuptools_darcs
kw:sftp
kw:shared
kw:shareset
kw:shell
kw:signals
kw:simultaneous
kw:six
kw:size
kw:slackware
kw:slashes
kw:smb
kw:sneakernet
kw:snowleopard
kw:socket
kw:solaris
kw:space
kw:space-efficiency
kw:spam
kw:spec
kw:speed
kw:sqlite
kw:ssh
kw:ssh-keygen
kw:sshfs
kw:ssl
kw:stability
kw:standards
kw:start
kw:startup
kw:static
kw:static-analysis
kw:statistics
kw:stats
kw:stats_gatherer
kw:status
kw:stdeb
kw:storage
kw:streaming
kw:strports
kw:style
kw:stylesheet
kw:subprocess
kw:sumo
kw:survey
kw:svg
kw:symlink
kw:synchronous
kw:tac
kw:tahoe-*
kw:tahoe-add-alias
kw:tahoe-admin
kw:tahoe-archive
kw:tahoe-backup
kw:tahoe-check
kw:tahoe-cp
kw:tahoe-create-alias
kw:tahoe-create-introducer
kw:tahoe-debug
kw:tahoe-deep-check
kw:tahoe-deepcheck
kw:tahoe-lafs-trac-stream
kw:tahoe-list-aliases
kw:tahoe-ls
kw:tahoe-magic-folder
kw:tahoe-manifest
kw:tahoe-mkdir
kw:tahoe-mount
kw:tahoe-mv
kw:tahoe-put
kw:tahoe-restart
kw:tahoe-rm
kw:tahoe-run
kw:tahoe-start
kw:tahoe-stats
kw:tahoe-unlink
kw:tahoe-webopen
kw:tahoe.css
kw:tahoe_files
kw:tahoewapi
kw:tarball
kw:tarballs
kw:tempfile
kw:templates
kw:terminology
kw:test
kw:test-and-set
kw:test-from-egg
kw:test-needed
kw:testgrid
kw:testing
kw:tests
kw:throttling
kw:ticket999-s3-backend
kw:tiddly
kw:time
kw:timeout
kw:timing
kw:to
kw:to-be-closed-on-2011-08-01
kw:tor
kw:tor-protocol
kw:torsocks
kw:tox
kw:trac
kw:transparency
kw:travis
kw:travis-ci
kw:trial
kw:trickle
kw:trivial
kw:truckee
kw:tub
kw:tub.location
kw:twine
kw:twistd
kw:twistd.log
kw:twisted
kw:twisted-14
kw:twisted-trial
kw:twitter
kw:twn
kw:txaws
kw:type
kw:typeerror
kw:ubuntu
kw:ucwe
kw:ueb
kw:ui
kw:unclean
kw:uncoordinated-writes
kw:undeletable
kw:unfinished-business
kw:unhandled-error
kw:unhappy
kw:unicode
kw:unit
kw:unix
kw:unlink
kw:update
kw:upgrade
kw:upload
kw:upload-helper
kw:uri
kw:url
kw:usability
kw:use-case
kw:utf-8
kw:util
kw:uwsgi
kw:ux
kw:validation
kw:variables
kw:vdrive
kw:verify
kw:verlib
kw:version
kw:versioning
kw:versions
kw:video
kw:virtualbox
kw:virtualenv
kw:vista
kw:visualization
kw:visualizer
kw:vm
kw:volunteergrid2
kw:volunteers
kw:vpn
kw:wapi
kw:warners-opinion-needed
kw:warning
kw:weapi
kw:web
kw:web.port
kw:webapi
kw:webdav
kw:webdrive
kw:webport
kw:websec
kw:website
kw:websocket
kw:welcome
kw:welcome-page
kw:welcomepage
kw:wiki
kw:win32
kw:win64
kw:windows
kw:windows-related
kw:winscp
kw:workaround
kw:world-domination
kw:wrapper
kw:write-enabler
kw:wui
kw:x86
kw:x86-64
kw:xhtml
kw:xml
kw:xss
kw:zbase32
kw:zetuptoolz
kw:zfec
kw:zookos-opinion-needed
kw:zope
kw:zope.interface
p/blocker
p/critical
p/major
p/minor
p/normal
p/supercritical
p/trivial
r/cannot reproduce
r/duplicate
r/fixed
r/invalid
r/somebody else's problem
r/was already fixed
r/wontfix
r/worksforme
t/defect
t/enhancement
t/task
v/0.2.0
v/0.3.0
v/0.4.0
v/0.5.0
v/0.5.1
v/0.6.0
v/0.6.1
v/0.7.0
v/0.8.0
v/0.9.0
v/1.0.0
v/1.1.0
v/1.10.0
v/1.10.1
v/1.10.2
v/1.10a2
v/1.11.0
v/1.12.0
v/1.12.1
v/1.13.0
v/1.14.0
v/1.15.0
v/1.15.1
v/1.2.0
v/1.3.0
v/1.4.1
v/1.5.0
v/1.6.0
v/1.6.1
v/1.7.0
v/1.7.1
v/1.7β
v/1.8.0
v/1.8.1
v/1.8.2
v/1.8.3
v/1.8β
v/1.9.0
v/1.9.0-s3branch
v/1.9.0a1
v/1.9.0a2
v/1.9.0b1
v/1.9.1
v/1.9.2
v/1.9.2a1
v/cloud-branch
v/unknown
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac#1425
No description provided.