Add a concise table of the URL tree to webapi.rst. #1663

Open
opened 2012-01-21 23:52:08 +00:00 by nejucomo · 13 comments
nejucomo commented 2012-01-21 23:52:08 +00:00
Owner

I would like to see a section of webapi.rst with a concise table of all handled URLs (or prefixes). For each url, a complete list of possible operations initiated by that URL would be extremely useful.

Exhaustive query parameters per operation might make the table too cluttered.

The use case is an operations engineer under time pressure without specific knowledge of Tahoe-LAFS needs to understand the URL structure in order to implement access control, caching, load balancing, or other opsy policies.

I would like to see a section of webapi.rst with a concise table of all handled URLs (or prefixes). For each url, a complete list of possible operations initiated by that URL would be extremely useful. Exhaustive query parameters per operation might make the table too cluttered. The use case is an operations engineer under time pressure without specific knowledge of Tahoe-LAFS needs to understand the URL structure in order to implement access control, caching, load balancing, or other opsy policies.
tahoe-lafs added the
unknown
major
enhancement
1.9.0
labels 2012-01-21 23:52:08 +00:00
tahoe-lafs added this to the undecided milestone 2012-01-21 23:52:08 +00:00
nejucomo commented 2012-01-22 01:28:50 +00:00
Author
Owner

As a first attempt to do this automatically, I learned that Nevow Page's will delegate requests to sub-url-paths by looking for child_<path segment> attributes or methods. A quick grep gives a basic partial picture of the URL namespace (see below).

The allmydata.web.root.Root class is a good starting point. There are at least these top-level handlers:

  • /operations
  • /storage
  • /uri
  • /cap
  • /file
  • /named
  • /status
  • /statistics

This is based on this simplistic grep:

$ find src/allmydata/web -type f -name '*.py' -print0 | xargs -0 grep -E '^class |child_' | grep -B 1 'child_'
src/allmydata/web/directory.py:class DirectoryNodeHandler(RenderMixin, rend.Page, ReplaceMeMixin):
src/allmydata/web/directory.py:        d = self.node.move_child_to(from_name, self.node, to_name, replace)
--
src/allmydata/web/introweb.py:class IntroducerRoot(rend.Page):
src/allmydata/web/introweb.py:    child_operations = None
--
src/allmydata/web/root.py:class Root(rend.Page):
src/allmydata/web/root.py:        self.child_operations = operations.OphandleTable(clock)
src/allmydata/web/root.py:        self.child_storage = storage.StorageStatus(s)
src/allmydata/web/root.py:        self.child_uri = URIHandler(client)
src/allmydata/web/root.py:        self.child_cap = URIHandler(client)
src/allmydata/web/root.py:        self.child_file = FileHandler(client)
src/allmydata/web/root.py:        self.child_named = FileHandler(client)
src/allmydata/web/root.py:        self.child_status = status.Status(client.get_history())
src/allmydata/web/root.py:        self.child_statistics = status.Statistics(client.stats_provider)
src/allmydata/web/root.py:    def child_helper_status(self, ctx):
src/allmydata/web/root.py:    child_provisioning = provisioning.ProvisioningTool()
src/allmydata/web/root.py:        child_reliability = reliability.ReliabilityTool()
src/allmydata/web/root.py:        child_reliability = NoReliability()
src/allmydata/web/root.py:    child_report_incident = IncidentReporter()
src/allmydata/web/root.py:    #child_server # let's reserve this for storage-server-over-HTTP
--
src/allmydata/web/status.py:class DownloadStatusPage(DownloadResultsRendererMixin, rend.Page):
src/allmydata/web/status.py:    def child_timeline(self, ctx):
src/allmydata/web/status.py:    def child_event_json(self, ctx):
As a first attempt to do this automatically, I learned that Nevow Page's will delegate requests to sub-url-paths by looking for `child_<path segment>` attributes or methods. A quick grep gives a basic partial picture of the URL namespace (see below). The `allmydata.web.root.Root` class is a good starting point. There are *at least* these top-level handlers: * `/operations` * `/storage` * `/uri` * `/cap` * `/file` * `/named` * `/status` * `/statistics` This is based on this simplistic grep: ``` $ find src/allmydata/web -type f -name '*.py' -print0 | xargs -0 grep -E '^class |child_' | grep -B 1 'child_' src/allmydata/web/directory.py:class DirectoryNodeHandler(RenderMixin, rend.Page, ReplaceMeMixin): src/allmydata/web/directory.py: d = self.node.move_child_to(from_name, self.node, to_name, replace) -- src/allmydata/web/introweb.py:class IntroducerRoot(rend.Page): src/allmydata/web/introweb.py: child_operations = None -- src/allmydata/web/root.py:class Root(rend.Page): src/allmydata/web/root.py: self.child_operations = operations.OphandleTable(clock) src/allmydata/web/root.py: self.child_storage = storage.StorageStatus(s) src/allmydata/web/root.py: self.child_uri = URIHandler(client) src/allmydata/web/root.py: self.child_cap = URIHandler(client) src/allmydata/web/root.py: self.child_file = FileHandler(client) src/allmydata/web/root.py: self.child_named = FileHandler(client) src/allmydata/web/root.py: self.child_status = status.Status(client.get_history()) src/allmydata/web/root.py: self.child_statistics = status.Statistics(client.stats_provider) src/allmydata/web/root.py: def child_helper_status(self, ctx): src/allmydata/web/root.py: child_provisioning = provisioning.ProvisioningTool() src/allmydata/web/root.py: child_reliability = reliability.ReliabilityTool() src/allmydata/web/root.py: child_reliability = NoReliability() src/allmydata/web/root.py: child_report_incident = IncidentReporter() src/allmydata/web/root.py: #child_server # let's reserve this for storage-server-over-HTTP -- src/allmydata/web/status.py:class DownloadStatusPage(DownloadResultsRendererMixin, rend.Page): src/allmydata/web/status.py: def child_timeline(self, ctx): src/allmydata/web/status.py: def child_event_json(self, ctx): ```
nejucomo commented 2012-01-22 01:29:20 +00:00
Author
Owner

See also: #1662

See also: #1662
nejucomo commented 2012-01-22 01:41:14 +00:00
Author
Owner

The URL handling path can also be modified by the putChild method. A grep shows two cases of static file handling:

  • In allmydata.web.root.Root.*init* at the end it adds all files in the static resource directory to the root URL namespace. For my repository that gives:
  • /d3-2.4.6.min.js
  • /d3-2.4.6.time.min.js
  • /download_status_timeline.js
  • /icon.png
  • /jquery-1.6.1.min.js
  • /tahoe.css
  • In allmydata.webish.WebishServer.buildServer if staticdir is truthy (and its default resolves to ~/.tahoe/public_html by default), it gets added under this subpath:
  • /static/
The URL handling path can also be modified by the `putChild` method. A grep shows two cases of static file handling: * In `allmydata.web.root.Root.*init*` at the end it adds all files in the `static` resource directory to the root URL namespace. For my repository that gives: * `/d3-2.4.6.min.js` * `/d3-2.4.6.time.min.js` * `/download_status_timeline.js` * `/icon.png` * `/jquery-1.6.1.min.js` * `/tahoe.css` * In `allmydata.webish.WebishServer.buildServer` if `staticdir` is truthy (and its default resolves to `~/.tahoe/public_html` by default), it gets added under this subpath: * `/static/`
zooko commented 2012-01-23 05:57:57 +00:00
Author
Owner

Good start! Thanks! Whoever wants to write a patch for the docs based on this should go ahead and mark it as review-needed.

Good start! Thanks! Whoever wants to write a patch for the docs based on this should go ahead and mark it as `review-needed`.
nejucomo commented 2012-01-25 04:09:04 +00:00
Author
Owner

I feel only moderate confidence that the URL structure above is complete enough to make sound URL-path-based access control decisions.

For example, when I request a directory node, my browser makes a request for /webform_css but I haven't found the source of this request processing. A quick search suggests it's in a dependency called formless:

$ find tahoe-lafs/ -iname '*webform*'
$ find tahoe-lafs/ -type f -print0 | xargs -0 grep -i webform
tahoe-lafs/src/allmydata/windows/tahoesvc.py:            from formless import webform, processors, annotate, iformless
tahoe-lafs/src/allmydata/windows/tahoesvc.py:                context, flatmdom, flatstan, twist, webform, processors, annotate, iformless, Decimal,
tahoe-lafs/static/tahoe.py:from formless import webform, processors, annotate, iformless
tahoe-lafs/static/tahoe.py:    context, flatmdom, flatstan, twist, webform, processors, annotate, iformless, Decimal,
$ find ~/virtualenvs/default/ -iname '*webform*'
/home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py
/home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.pyc
$ grep webform_css /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py
$ grep -i webform_css /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py
$ grep -i webform /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py
            return webform.renderForms(
I feel only moderate confidence that the URL structure above is complete enough to make sound URL-path-based access control decisions. For example, when I request a directory node, my browser makes a request for `/webform_css` but I haven't found the source of this request processing. A quick search suggests it's in a dependency called `formless`: ``` $ find tahoe-lafs/ -iname '*webform*' $ find tahoe-lafs/ -type f -print0 | xargs -0 grep -i webform tahoe-lafs/src/allmydata/windows/tahoesvc.py: from formless import webform, processors, annotate, iformless tahoe-lafs/src/allmydata/windows/tahoesvc.py: context, flatmdom, flatstan, twist, webform, processors, annotate, iformless, Decimal, tahoe-lafs/static/tahoe.py:from formless import webform, processors, annotate, iformless tahoe-lafs/static/tahoe.py: context, flatmdom, flatstan, twist, webform, processors, annotate, iformless, Decimal, $ find ~/virtualenvs/default/ -iname '*webform*' /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.pyc $ grep webform_css /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py $ grep -i webform_css /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py $ grep -i webform /home/n/virtualenvs/default/lib/python2.7/site-packages/formless/webform.py return webform.renderForms( ```
nejucomo commented 2012-01-25 04:55:21 +00:00
Author
Owner

The results of this ticket would inform tickets #1665, #860, and #587.

The results of this ticket would inform tickets #1665, #860, and #587.
tahoe-lafs added
documentation
and removed
unknown
labels 2012-03-12 19:26:09 +00:00
tahoe-lafs added
normal
and removed
major
labels 2012-03-29 19:11:53 +00:00
marlowe commented 2012-06-06 03:57:39 +00:00
Author
Owner

22:56 < nejucomo> marlowe: The first pass could contain path-pattern,
http-methods, interesting-query-parameters, and a short note
about what operations are available there.

22:56 < nejucomo> marlowe: The first pass could contain path-pattern, http-methods, interesting-query-parameters, and a short note about what operations are available there.
marlowe commented 2012-06-06 03:58:03 +00:00
Author
Owner

22:56 < nejucomo> marlowe: The first pass could contain path-pattern,
http-methods, interesting-query-parameters, and a short note
about what operations are available there.

22:56 < nejucomo> marlowe: The first pass could contain path-pattern, http-methods, interesting-query-parameters, and a short note about what operations are available there.
marlowe commented 2012-06-06 03:59:31 +00:00
Author
Owner

22:57 < nejucomo> Actually, that might be even too specific for the first pass.
Maybe just path-pattern and a short description of what
purpose that path servers (especially which information it
leaks and what state it can change).
22:58 < marlowe> change noted
22:58 < nejucomo> Something like: "/file/<CAP" - "This url path is used to read
and update files in the grid." "/status/<…>" - "This path
gives status for current upload, download, verification, and
repair operations."

22:57 < nejucomo> Actually, that might be even too specific for the first pass. Maybe just path-pattern and a short description of what purpose that path servers (especially which information it leaks and what state it can change). 22:58 < marlowe> change noted 22:58 < nejucomo> Something like: "/file/<CAP" - "This url path is used to read and update files in the grid." "/status/<…>" - "This path gives status for current upload, download, verification, and repair operations."
davidsarah commented 2012-11-19 00:42:40 +00:00
Author
Owner

thedod wrote at #1866:

At [the WAPI doc]source:git/docs/frontends/webapi.rst there should be a list of of all prefix options, followed by a list of sections describing each.

/cap appears as an orphan line somewhere on the page, and it's not clear from the text whether we should use it and when.

/file appears inside the /named section, and theres's also no mention of the variant that contains a /@@named=/ component (the WAPI [WUI]actually links to such urls).
What does it mean (removing the component doesn't seem to matter)? Which of the 3 options (/named/ or /file/ with or without /@@named=/) is preferred when?

Another thing I can't find (maybe it's on some other doc, but then webapi.rst should link to it): list (+ explanation) of all cap types: DIR2-RO, DIR2-CHK, CHK, etc. (and in general - explanation of cap uri syntax).
This (or a link to it) should appear before explaining urls that contain a cap :)

thedod wrote at #1866: > At [the WAPI doc]source:git/docs/frontends/webapi.rst there should be a list of of all prefix options, followed by a list of sections describing each. > > `/cap` appears as an orphan line somewhere on the page, and it's not clear from the text whether we should use it and when. > > `/file` appears inside the `/named` section, and theres's also no mention of the variant that contains a `/@@named=/` component (the WAPI [WUI]actually links to such urls). > What does it mean (removing the component doesn't seem to matter)? Which of the 3 options (`/named/` or `/file/` with or without `/@@named=/`) is preferred when? > > Another thing I can't find (maybe it's on some other doc, but then webapi.rst should link to it): list (+ explanation) of all cap types: `DIR2-RO`, `DIR2-CHK`, `CHK`, etc. (and in general - explanation of cap uri syntax). > This (or a link to it) should appear before explaining urls that *contain* a cap :)
davidsarah commented 2012-11-19 00:45:38 +00:00
Author
Owner

zooko wrote at #1866:

Okay, to close this ticket, update webapi.rst to answer all these questions. Here are some answers you could use to that end...

/cap was a plan that I had to rename "uri" to "cap" everywhere. I thought it was more helpful to users to call those things caps instead of uris.

Part of why Brian had agreed to go along with this was that Kevin Reid emphasized to us that we're not supposed to call a thing a "uri" unless it has some sort of official recognition from some namespace allocator like IANA or something.

We wound up changing most but not all of the things that were easy to change -- the docs and some of the source code -- but not changing how it is spelled in the WAPI.

I guess we should consider resuming that process of renaming, if only because a half-renamed thing is almost as bad as a consistently bad badly-named thing. :-/

+1 (a half-renamed thing is worse, IMHO)

Anyway, /cap ought to be a synonym of /uri, but I'm not sure what happens if you actually use it.

The /@@named=/ feature is kind of complicated. The goal is: tell the web server (tahoe-lafs gateway) that the resource you want to download is a certain cap, e.g. "URI:CHK:egrocatgmbuoqra3e3jptkzvwe:543sre2wsjmqwbk73in76oqaemi35iqeyzggavc4vp6kkvc43nkq:1:1:948821", but at the same time tell the web browser that the resource you are fetching is named something like "Murphy-2012-Deaths!*Preliminary_Data_For_2010.pdf". The way we do this is by appending a string after the cap which will be ignored by the server (LAFS gateway), but which will make the browser think that the file has that name. So, for example /uri/URI:CHK:egrocatgmbuoqra3e3jptkzvwe:543sre2wsjmqwbk73in76oqaemi35iqeyzggavc4vp6kkvc43nkq:1:1:948821/@@named=/Murphy-2012-Deaths*Preliminary_Data_For_2010.pdf.

Now, the further complication is that if the cap is a dir cap as opposed to a file cap, then /uri/URI:DIR2-MDMF-RO:ppnrefnrnovjpoiv3jirjnpoim:obhqprvm6hafvarzzssrawgazx6p6tgopi4fslirhelg7xqyfr6a/@@named=/foo could be interpreted by the web server (LAFS gateway) as meaning "Get the child out of the dir whose name is @@named= and then treat that child as a directory and look in that for a child of it named foo. In order to avoid that misinterpretation, we added the /file/ instead of /uri/ to specify that this is not a dir.

Here was a thread about this on tahoe-dev long ago:

https://tahoe-lafs.org/pipermail/tahoe-dev/2008-May/000573.html

Frankly, the resulting API is kind of weird and I wonder if we couldn't come up with a simpler and better one!

Now as to the list of cap types and cap syntax, there are at least the following two docs, and they should be cross-linked, and linked to from webapi.rst, and probably unified:

zooko wrote at #1866: > Okay, to close this ticket, update webapi.rst to answer all these questions. Here are some answers you could use to that end... > > `/cap` was a plan that I had to rename "uri" to "cap" everywhere. I thought it was more helpful to users to call those things caps instead of uris. > > Part of why Brian had agreed to go along with this was that Kevin Reid emphasized to us that we're not supposed to call a thing a "uri" unless it has some sort of official recognition from some namespace allocator like IANA or something. > > We wound up changing most but not all of the things that were easy to change -- the docs and some of the source code -- but not changing how it is spelled in the WAPI. > > I guess we should consider resuming that process of renaming, if only because a half-renamed thing is almost as bad as a consistently bad badly-named thing. :-/ +1 (a half-renamed thing is worse, IMHO) > Anyway, `/cap` *ought* to be a synonym of `/uri`, but I'm not sure what happens if you actually use it. > > The `/@@named=/` feature is kind of complicated. The goal is: tell the web server (tahoe-lafs gateway) that the resource you want to download is a certain cap, e.g. "URI:CHK:egrocatgmbuoqra3e3jptkzvwe:543sre2wsjmqwbk73in76oqaemi35iqeyzggavc4vp6kkvc43nkq:1:1:948821", but at the same time tell the web *browser* that the resource you are fetching is named something like "Murphy-2012-Deaths!*Preliminary_Data_For_2010.pdf". The way we do this is by appending a string after the cap which will be ignored by the server (LAFS gateway), but which will make the browser think that the file has that name. So, for example `/uri/URI:CHK:egrocatgmbuoqra3e3jptkzvwe:543sre2wsjmqwbk73in76oqaemi35iqeyzggavc4vp6kkvc43nkq:1:1:948821/@@named=/Murphy-2012-Deaths*Preliminary_Data_For_2010.pdf`. > > Now, the further complication is that if the cap is a dir cap as opposed to a file cap, then `/uri/URI:DIR2-MDMF-RO:ppnrefnrnovjpoiv3jirjnpoim:obhqprvm6hafvarzzssrawgazx6p6tgopi4fslirhelg7xqyfr6a/@@named=/foo` could be interpreted by the web server (LAFS gateway) as meaning "Get the child out of the dir whose name is `@@named=` and then treat that child as a directory and look in that for a child of it named `foo`. In order to avoid that misinterpretation, we added the `/file/` instead of `/uri/` to specify that this is not a dir. > > Here was a thread about this on tahoe-dev long ago: > > <https://tahoe-lafs.org/pipermail/tahoe-dev/2008-May/000573.html> > > Frankly, the resulting API is kind of weird and I wonder if we couldn't come up with a simpler and better one! > > > Now as to the list of cap types and cap syntax, there are at least the following two docs, and they should be cross-linked, and linked to from webapi.rst, and probably unified: > > * [wiki/Capabilities](wiki/Capabilities) > * source:git/docs/specifications/uri.rst
zooko commented 2012-11-19 21:43:46 +00:00
Author
Owner

Created #1868 (rename "uri" to "cap" everywhere).

Created #1868 (rename "uri" to "cap" everywhere).
davidsarah commented 2012-11-20 02:59:31 +00:00
Author
Owner

Replying to zooko:

Created #1868 (rename "uri" to "cap" everywhere).

Duplicate of #1715 (change all docs and generated URLs to point to "/cap" instead of "/uri"). Note my disagreement in ticket:1715#comment:128758.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1663#issuecomment-128768): > Created #1868 (rename "uri" to "cap" everywhere). Duplicate of #1715 (change all docs and generated URLs to point to "/cap" instead of "/uri"). Note my disagreement in ticket:1715#[comment:128758](/tahoe-lafs/trac-2024-07-25/issues/1663#issuecomment-128758).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#1663
No description provided.