A RESTful Web Interface

0 comments | 2003-07-21 | permalink

Intro

One of the smaller yet surprisingly significant issues in the development of one of the Pie/Echo/Atom reference interfaces is a decision between the full suite of HTTP methods (like GET, PUT, POST, and DELETE, and potentially those in WebDAV) and just two methods (GET and POST). Tim Bray proposes the latter in his ongoing essay, A Web Interface for Web Publishing. This essay presents a sample implementation of the former. Discussion of this issue is happening on the Pie wiki page RestEchoApiDiscuss.

Tim says of the Web,

The Web wants you to call important things "Resources" and identify them with URIs. [...] The Web wants you to access resources using a very small set of verbs; in practice, those defined for HTTP in RFC2616. GET, the most common, asks a server to send a representation of a resource. POST is designed to do the following (quoting from section 9.5 of the RFC):

  • Annotation of existing resources;
  • Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
  • Providing a block of data, such as the result of submitting a form, to a data-handling process;
  • Extending a database through an append operation.

PUT requests that a representation of a resource be stored, and DELETE requests that a resource be deleted.

In a weblog, those resources would be entries, comments, stories, templates, images, audio, video and more. Tim goes on to describe an interface using GET and POST but, in effect, deprecates PUT and DELETE in favor of using POST for those actions as well. He provides more arguments for that decision on the wiki.

Using the common HTTP verbs provides a pattern for working with all resources:

  • GET to retrieve a resource or perform a read-only query;
  • PUT to update a known resource or to create a named resource;
  • POST to create a new sequential resource or perform an update-query;
  • DELETE to delete a resource.

In the context of weblogs, note the difference between POST and PUT for creating new entries. Some tools create resource names (URIs) dynamically for new entries (use POST) while other tools use user-provided names for the resources (use PUT). The sample below uses PUT for entries at named URIs and POST for adding comments to those entries.

A RESTful implementation

Let's look at an implementation of a weblog server using this pattern, atom.cgi. This server is written according to the widely used Common Gateway Interface (CGI). To summarize CGI: the program is passed information about the HTTP request as environment variables, uploaded data is read as input ("standard input" in command terms), responses are sent as text lines followed by a blank line, followed by the data to be sent (all on "standard output"). Most non-CGI server implementations are very similar. The following CGI environment variables are used,

  • $REQUEST_METHOD is the HTTP method (GET, PUT, etc.)
  • $PATH_INFO is the URI path, relative to the CGI script
  • $QUERY_STRING contains the text following a '?' in the HTTP request URI
   38   case $REQUEST_METHOD in
   39     GET | HEAD)
   40       if [ -z "$PATH_INFO" ]; then
   41         # No path given, redirect to home page
   42         echo "Status: 302 Found"
   43         echo "Location: http://bitsko.slc.ut.us/~ken/atom.cgi/index.atom"
   44         echo
   45       elif [ -f "$PATH_INFO" ]; then
   46         if [ "$QUERY_STRING" = "references" ]; then
   47           ./references "$PATH_INFO"
   48         else
   49           echo "Content-Type: `file -bi $PATH_INFO`"
   50           echo
   51           cat $PATH_INFO
   52         fi
   53       else
   54         echo "Status: 404 Object not found!"
   55         echo
   56       fi
   57       ;;

Line 38 sets the context of what the server is going to do, choosing a section of code based on the HTTP verb, $REQUEST_METHOD. This section handles the GET and HEAD methods (Line 39). The GET method will return content for resources that exist (Lines 40-52) or report an HTTP 404 (Object not found) error if they don't (Lines 54-55).

If no relative URI path is provided, $PATH_INFO is zero-length (Line 40), then this request is for the home page and we report to the browser that we know the resource exists (status 302 (Found), Line 42) and what specific location it can be found at (Line 43). The browser will turn right around and ask for the home page, which falls into the next section of code.

If the resource exists on the server (testing for the file succeeds, Line 45), then we check to see if the browser is performing a 'references' query (checking $QUERY_STRING, Line 46) and if so, we send the references for that URI (using the 'references' script, line 47). Otherwise, we initiate sending the resource by giving its content type (nicely provided by the 'file' command, Line 49) and then the content of the file it self (using the 'cat' command, Line 51).

In the case of the HEAD request, all of the same actions are performed as in a GET, but the server will only return the HTTP response header and not the content.

   58     PUT)
   59       # the returned status depends on whether this resource already exists
   60       if [ -f "$PATH_INFO"]; then
   61         STATUS="200 OK"
   62       else
   63         STATUS="201 Created"
   64       fi
   65       cat >$PATH_INFO
   66       ./update_home_page
   67       echo "Status: $STATUS"
   68       echo
   69       ;;

This section handles the PUT method (Line 58). The status reported by the server depends on whether the file exists (Line 60), 200 (OK) if it does (Line 61) and 201 (Created) if it does not and this PUT will create it.

Line 65 uses the 'cat' command to read input from the web server on "standard input" and redirects its "standard output" to the file, $PATH_INFO being created or updated. Line 66 calls the 'update_home_page' script to update the home page based on new or changed files, then we report (Line 67) the OK or Created status we determined earlier (Lines 60-64).

   70     DELETE)
   71       rm -f $PATH_INFO
   72       echo "Status: 200 OK"
   73       echo
   74       ;;

The DELETE method (starting this section on Line 70) removes the file $PATH_INFO, ignoring any errors (using '-f', Line 71), and reporting a 200 (OK) status (Line 72).

   75     POST)  # for comments only
   76       if [ -f "$PATH_INFO" ]; then
   77         POST_ID=1
   78         BASE_NAME="`basename $PATH_INFO .atom`"
   79         # loop thru POST_IDs until one is free; lockfile creates the file
   80         # if it doesn't exist, so some other post doesn't get it first
   81         while ! lockfile -0 -r1 atom/comments/$BASE_NAME-$POST_ID.atom; do
   82           POST_ID="`expr $POST_ID + 1`"
   83         done
   84         # save the new comment
   85         chmod 644 atom/comments/$BASE_NAME-$POST_ID.atom
   86         cat >atom/comments/$BASE_NAME-$POST_ID.atom
   87         # respond with the new comment URI
   88         echo "Status: 201 Created"
   89         echo "Location: http://bitsko.slc.ut.us/~ken/atom.cgi/comments/$BASE_NAME-$POST_ID.atom"
   90         echo
   91       else
   92         echo "Status: 404 Object not found!"
   93         echo
   94       fi
   95       ;;
   96   esac

The POST method (starting this section on Line 75) is only used by this server to create sequentially numbered comments on resources, typically entries. We only allow posts if the resource exists (Line 76) otherwise we report a 404 (Object not found) error (Lines 92-93).

We start comment numbers, POST_ID, at one (Line 77) and use only the filename portion (basename) of the $PATH_INFO, without the directory portion or a '.atom' extension (Line 78). The 'lockfile' command is used (with no wait and only one try, Line 81) to both check to see if a comment exists and create the empty file if it does not (so no other POSTs can create the comment between the time we check to see if it exists and the time we create it). Within the loop, we add one to the POST_ID to try each comment number sequentially until we find one that hadn't already existed yet (Line 82 of lines 81-83).

When we have the newly created POST_ID file, we make it writeable (Line 85; 'lockfile' creates it read-only), then like PUT above, we use the 'cat' command to write the comment file (Line 86). We send a 201 (Created) status (Line 88) and report the newly created URI location of the comment (Line 89).

Line 96 ends the case statement we started on Line 38. All done!

Wrap up

Note that this script is only a sample of a server using the common HTTP verbs -- it is not validating content, authenticating authors and posters, performing proper path checks or much other error processing.

This server allows one to manage most of the types of resources and URI locations possible at a web site: entries, comments, style sheets, photos, audio, preferences, etc. A real server would restrict the URI/file hierarchy of resources created and limit the modification of certain resources to their creators, webmasters, or the web site owner.


REST stands for Representational State Transfer a name coined by Roy Fielding to describe how and why the web works the way it always has. A RESTful Web Interface is an interface that works in the REST Architectural Style; the way the web wants to work.