Object traversal is the process Twisted.Web2 uses to determine what object to use to render HTML for a particular URL. When an HTTP request comes in to the web server, the object publisher splits the URL into segments, and repeatedly calls methods which consume path segments and return objects which represent that path, until all segments have been consumed. At the core, the Web2 traversal API is very simple. However, it provides some higher level functionality layered on top of this to satisfy common use cases.
Object Traversal Basics
The root resource is the top-level object in the URL
space; it conceptually represents the URI "/". The
Twisted.Web2 object traversal and object
publishing machinery uses only two methods to locate an
object suitable for publishing and to generate the HTML from it;
these methods are described in the
interface twisted.web2.iweb.IResource
:
class IResource(Interface): """ I am a web resource. """ def locateChild(request, segments): """Locate another object which can be adapted to IResource. return: A 2-tuple of (resource, remaining-path-segments), or a deferred which will fire the above. Causes the object publishing machinery to continue on with specified resource and segments, calling the appropriate method on the specified resource. If you return (self, L{server.StopTraversal}), this instructs web2 to immediately stop the lookup stage, and switch to the rendering stage, leaving the remaining path alone for your render function to handle. """ def renderHTTP(request): """Return an IResponse or a deferred which will fire an IResponse. This response will be written to the web browser which initiated the request. """
Let's examine what happens when object traversal occurs over a very simple root resource:
from twisted.web2 import iweb, http, stream class SimpleRoot(object): implements(iweb.IResource) def locateChild(self, request, segments): return self, () def renderHTTP(self, request): return http.Response(200, stream=stream.MemoryStream("Hello, world!"))
This resource, when passed as the root resource to
server.Site
or
wsgi.createWSGIApplication
, will
immediately return itself, consuming all path segments. This
means that for every URI a user visits on a web server which
is serving this root resource, the text "Hello,
world!" will be rendered. Let's examine the value of
segments
for various values of
URI:
/foo/bar ('foo', 'bar') / ('', ) /foo/bar/baz.html ('foo', 'bar', 'baz.html') /foo/bar/directory/ ('foo', 'bar', 'directory', '')
So we see that Web2 does nothing more than split the URI on the string '/' and pass these path segments to our application for consumption. Armed with these two methods alone, we already have enough information to write applications which service any form of URL imaginable in any way we wish. However, there are some common URL handling patterns which Twisted.Web2 provides higher level support for.
locateChild in depth
One common URL handling pattern involves parents which only
know about their direct children. For example,
a Directory
object may only
know about the contents of a single directory, but if it
contains other directories, it does not know about the contents
of them. Let's examine a
simple Directory
object which
can provide directory listings and serves up objects for child
directories and files:
from twisted.web2 import resource class Directory(resource.Resource): def __init__(self, directory): self.directory = directory def renderHTTP(self, request): html = ['<ul>'] for child in os.listdir(self.directory): fullpath = os.path.join(self.directory, child) if os.path.isdir(fullpath): child += '/' html.extend(['<li><a href="', child, '">', child, '</a></li>']) html.append('</ul>') html = stream.MemoryStream(''.join(html)) return http.Response(200, stream=html) def locateChild(self, request, segments): name = segments[0] fullpath = os.path.join(self.directory, name) if not os.path.exists(fullpath): return None, () # 404 if os.path.isdir(fullpath): return Directory(fullpath), segments[1:] if os.path.isfile(fullpath): return static.File(fullpath), segments[1:]
Because this implementation of locateChild
only consumed one segment and returned the rest of them
(segments[1:]
),
the object traversal process will continue
by calling locateChild
on the returned resource and passing the partially-consumed
segments. In this way, a directory structure of any depth can be
traversed, and directory listings or file contents can be rendered for any
existing directories and files.
So, let us examine what happens when the URI "/foo/bar/baz.html" is traversed, where "foo" and "bar" are directories, and "baz.html" is a file.
- Directory('/').locateChild(request, ('foo', 'bar', 'baz.html')) - Returns Directory('/foo'), ('bar', 'baz.html')
- Directory('/foo').locateChild(request, ('bar', 'baz.html')) - Returns Directory('/foo/bar'), ('baz.html, )
- Directory('/foo/bar').locateChild(request, ('baz.html')) - Returns File('/foo/bar/baz.html'), ()
No more segments to be
consumed; File('/foo/bar/baz.html').renderHTTP(ctx)
is
called, and the result is sent to the browser.
childFactory method
Consuming one URI segment at a time by checking to see if a
requested resource exists and returning a new object is a very
common pattern. Web2's default implementation
of twisted.web2.iweb.IResource
, twisted.web2.resource.Resource
, contains an
implementation of locateChild
which
provides more convenient hooks for implementing object
traversal. One of these hooks
is childFactory
. Let us imagine for
the sake of example that we wished to render a tree of
dictionaries. Our data structure might look something like this:
tree = dict( one=dict( foo=None, bar=None), two=dict( baz=dict( quux=None)))
Given this data structure, the valid URIs would be:
- /
- /one
- /one/foo
- /one/bar
- /two
- /two/baz
- /two/baz/quux
Let us construct
a twisted.web2.resource.Resource
subclass which uses the
default locateChild
implementation
and overrides the childFactory
hook
instead:
from twisted.web2 import http, resource, stream class DictTree(resource.Resource): def __init__(self, dataDict): self.dataDict = dataDict def renderHTTP(self, request): if self.dataDict is None: content = "Leaf" else: html = ['<ul>'] for key in self.dataDict.keys(): html.extend(['<li><a href="', key, '">', key, '</a></li>']) html.append('</ul>') content = ''.join(html) return http.Response(200, stream=stream.MemoryStream(content)) def childFactory(self, request, name): if name not in self.dataDict: return None # 404 return DictTree(self.dataDict[name])
As you can see, the childFactory
implementation is considerably shorter than the equivalent locateChild
implementation would have been.
child_* methods and attributes
Often we may wish to have some hardcoded URLs which are not
dynamically generated based on some data structure. For example,
we might have an application which uses an external CSS
stylesheet, an external JavaScript file, and a folder full of
images. The twisted.web2.resource.Resource
locateChild
implementation provides a
convenient way for us to express these relationships by using
child_
prefixed methods:
from twisted.web2 import resource, http, static class Linker(resource.Resource): def renderHTTP(self, request): page = """<html> <head> <link href="css" rel="stylesheet" /> <script type="text/javascript" src="scripts" /> <body> <img src="images/logo.png" /> </body> </html>""" return http.Response(200, stream=stream.MemoryStream(page)) def child_css(self, request): return static.File('/Users/dp/styles.css') def child_scripts(self, request): return static.File('/Users/dp/scripts.js') def child_images(self, request): return static.File('/Users/dp/images/')
One thing you may have noticed is that all of the examples so
far have returned new object instances whenever they were
implementing a traversal API. However, there is no reason these
instances cannot be shared. One could for example return a
global resource instance, an instance which was previously
inserted in a dict, or lazily create and cache dynamic resource
instances on the fly. The resource.Resource
locateChild
implementation also provides a
convenient way to express that one global resource instance
should always be used for a particular url,
the child
-prefixed attribute:
class FasterLinker(Linker): child_css = static.File('/Users/dp/styles.css') child_scripts = static.File('/Users/dp/scripts.js') child_images = static.File('/Users/dp/images/')
Dots in child names
When a URL contains dots, which is quite common in normal URLs,
it is simple enough to handle these URL segments
in locateChild
or childFactory
one of the passed
segments will simply be a string containing a dot. However, it
is notimmediately obvious how one would express a URL segment
with a dot in it when
using child
-prefixed methods. The
solution is really quite simple:
class DotChildren(resource.Resource): def render(self, request): return http.Response(200, stream=""" """)
If you only wish to add a child to specific instance of DotChildren then you should use the putChild method.
rsrc = DotChildren() rsrc.putChild('child_scripts.js', static.File('/Users/dp/scripts.js'))
However if you wish to add a class attribute you can use setattr like so.
setattr(DotChildren, 'child_scripts.js', static.File('/Users/dp/scripts.js'))
The same technique could be used to install a child method with a dot in the name.
The default trailing slash handler
When a URI which is being handled ends in a slash, such as when
the '/' URI is being rendered or when a directory-like URI is
being rendered, the string '' appears in the path segments which
will be traversed. Again, handling this case is trivial inside
either locateChild
or childFactory
, but it may not be
immediately obvious
what child
-prefixed method or
attribute will be looked up. The method or attribute name which
will be used is simply child
with a
single trailing underscore.
The resource.Resource
class
provides an implementation of this method which can work in two
different ways. If the
attribute addSlash
is True, the
default trailing slash handler will
return self
. In the case
when addSlash
is True, the
default resource.Resource.renderHTTP
implementation will simply perform a redirect which
adds the missing slash to the URL.
The default trailing slash handler also returns self
if addSlash
is false, but emits a
warning as it does so. This warning may become an exception at
some point in the future.
IRequest.prepath and IRequest.postpath
During object traversal, it may be useful to discover which
segments have already been handled and which segments are
remaining to be handled. In locateChild the remaining segments
are given as the second argument. However, since all object
traversal APIs are also passed
the request
object, this information
can also be obtained via
the IRequest.prepath
and IRequest.postpath
attributes.
Conclusion
Twisted.web2 makes it easy to handle complex URL
hierarchies. The most basic object traversal
interface, twisted.web2.iweb.IResource.locateChild
,
provides powerful and flexible control over the entire object
traversal process. Web2's
canonical IResource
implementation, resource.Resource
, also includes the
convenience hooks childFactory
along
with child
-prefixed method and
attribute semantics to simplify common use cases.