A RESTful Web Interface
0 comments | 2003-07-21 | permalinkIntro
One of the smaller yet surprisingly significant issues in the development of one of the Pie/Echo/Atom reference interfaces is a decision between the full suite of HTTP methods (like GET, PUT, POST, and DELETE, and potentially those in WebDAV) and just two methods (GET and POST). Tim Bray proposes the latter in his ongoing essay, A Web Interface for Web Publishing. This essay presents a sample implementation of the former. Discussion of this issue is happening on the Pie wiki page RestEchoApiDiscuss.
Tim says of the Web,
The Web wants you to call important things "Resources" and identify them with URIs. [...] The Web wants you to access resources using a very small set of verbs; in practice, those defined for HTTP in RFC2616. GET, the most common, asks a server to send a representation of a resource. POST is designed to do the following (quoting from section 9.5 of the RFC):
- Annotation of existing resources;
- Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
- Providing a block of data, such as the result of submitting a form, to a data-handling process;
- Extending a database through an append operation.
PUT requests that a representation of a resource be stored, and DELETE requests that a resource be deleted.
In a weblog, those resources would be entries, comments, stories, templates, images, audio, video and more. Tim goes on to describe an interface using GET and POST but, in effect, deprecates PUT and DELETE in favor of using POST for those actions as well. He provides more arguments for that decision on the wiki.
Using the common HTTP verbs provides a pattern for working with all resources:
- GET to retrieve a resource or perform a read-only query;
- PUT to update a known resource or to create a named resource;
- POST to create a new sequential resource or perform an update-query;
- DELETE to delete a resource.
In the context of weblogs, note the difference between POST and PUT for creating new entries. Some tools create resource names (URIs) dynamically for new entries (use POST) while other tools use user-provided names for the resources (use PUT). The sample below uses PUT for entries at named URIs and POST for adding comments to those entries.
A RESTful implementation
Let's look at an implementation of a weblog server using this pattern, atom.cgi. This server is written according to the widely used Common Gateway Interface (CGI). To summarize CGI: the program is passed information about the HTTP request as environment variables, uploaded data is read as input ("standard input" in command terms), responses are sent as text lines followed by a blank line, followed by the data to be sent (all on "standard output"). Most non-CGI server implementations are very similar. The following CGI environment variables are used,
- $REQUEST_METHOD is the HTTP method (GET, PUT, etc.)
- $PATH_INFO is the URI path, relative to the CGI script
- $QUERY_STRING contains the text following a '?' in the HTTP request URI
38 case $REQUEST_METHOD in 39 GET | HEAD) 40 if [ -z "$PATH_INFO" ]; then 41 # No path given, redirect to home page 42 echo "Status: 302 Found" 43 echo "Location: http://bitsko.slc.ut.us/~ken/atom.cgi/index.atom" 44 echo 45 elif [ -f "$PATH_INFO" ]; then 46 if [ "$QUERY_STRING" = "references" ]; then 47 ./references "$PATH_INFO" 48 else 49 echo "Content-Type: `file -bi $PATH_INFO`" 50 echo 51 cat $PATH_INFO 52 fi 53 else 54 echo "Status: 404 Object not found!" 55 echo 56 fi 57 ;;
Line 38 sets the context of what the server is going to do, choosing a section of code based on the HTTP verb, $REQUEST_METHOD. This section handles the GET and HEAD methods (Line 39). The GET method will return content for resources that exist (Lines 40-52) or report an HTTP 404 (Object not found) error if they don't (Lines 54-55).
If no relative URI path is provided, $PATH_INFO is zero-length (Line 40), then this request is for the home page and we report to the browser that we know the resource exists (status 302 (Found), Line 42) and what specific location it can be found at (Line 43). The browser will turn right around and ask for the home page, which falls into the next section of code.
If the resource exists on the server (testing for the file succeeds, Line 45), then we check to see if the browser is performing a 'references' query (checking $QUERY_STRING, Line 46) and if so, we send the references for that URI (using the 'references' script, line 47). Otherwise, we initiate sending the resource by giving its content type (nicely provided by the 'file' command, Line 49) and then the content of the file it self (using the 'cat' command, Line 51).
In the case of the HEAD request, all of the same actions are performed as in a GET, but the server will only return the HTTP response header and not the content.
58 PUT) 59 # the returned status depends on whether this resource already exists 60 if [ -f "$PATH_INFO"]; then 61 STATUS="200 OK" 62 else 63 STATUS="201 Created" 64 fi 65 cat >$PATH_INFO 66 ./update_home_page 67 echo "Status: $STATUS" 68 echo 69 ;;
This section handles the PUT method (Line 58). The status reported by the server depends on whether the file exists (Line 60), 200 (OK) if it does (Line 61) and 201 (Created) if it does not and this PUT will create it.
Line 65 uses the 'cat' command to read input from the web server on "standard input" and redirects its "standard output" to the file, $PATH_INFO being created or updated. Line 66 calls the 'update_home_page' script to update the home page based on new or changed files, then we report (Line 67) the OK or Created status we determined earlier (Lines 60-64).
70 DELETE) 71 rm -f $PATH_INFO 72 echo "Status: 200 OK" 73 echo 74 ;;
The DELETE method (starting this section on Line 70) removes the file $PATH_INFO, ignoring any errors (using '-f', Line 71), and reporting a 200 (OK) status (Line 72).
75 POST) # for comments only 76 if [ -f "$PATH_INFO" ]; then 77 POST_ID=1 78 BASE_NAME="`basename $PATH_INFO .atom`" 79 # loop thru POST_IDs until one is free; lockfile creates the file 80 # if it doesn't exist, so some other post doesn't get it first 81 while ! lockfile -0 -r1 atom/comments/$BASE_NAME-$POST_ID.atom; do 82 POST_ID="`expr $POST_ID + 1`" 83 done 84 # save the new comment 85 chmod 644 atom/comments/$BASE_NAME-$POST_ID.atom 86 cat >atom/comments/$BASE_NAME-$POST_ID.atom 87 # respond with the new comment URI 88 echo "Status: 201 Created" 89 echo "Location: http://bitsko.slc.ut.us/~ken/atom.cgi/comments/$BASE_NAME-$POST_ID.atom" 90 echo 91 else 92 echo "Status: 404 Object not found!" 93 echo 94 fi 95 ;; 96 esac
The POST method (starting this section on Line 75) is only used by this server to create sequentially numbered comments on resources, typically entries. We only allow posts if the resource exists (Line 76) otherwise we report a 404 (Object not found) error (Lines 92-93).
We start comment numbers, POST_ID, at one (Line 77) and use only the filename portion (basename) of the $PATH_INFO, without the directory portion or a '.atom' extension (Line 78). The 'lockfile' command is used (with no wait and only one try, Line 81) to both check to see if a comment exists and create the empty file if it does not (so no other POSTs can create the comment between the time we check to see if it exists and the time we create it). Within the loop, we add one to the POST_ID to try each comment number sequentially until we find one that hadn't already existed yet (Line 82 of lines 81-83).
When we have the newly created POST_ID file, we make it writeable (Line 85; 'lockfile' creates it read-only), then like PUT above, we use the 'cat' command to write the comment file (Line 86). We send a 201 (Created) status (Line 88) and report the newly created URI location of the comment (Line 89).
Line 96 ends the case statement we started on Line 38. All done!
Wrap up
Note that this script is only a sample of a server using the common HTTP verbs -- it is not validating content, authenticating authors and posters, performing proper path checks or much other error processing.
This server allows one to manage most of the types of resources and URI locations possible at a web site: entries, comments, style sheets, photos, audio, preferences, etc. A real server would restrict the URI/file hierarchy of resources created and limit the modification of certain resources to their creators, webmasters, or the web site owner.
REST stands for Representational State Transfer a name coined by Roy Fielding to describe how and why the web works the way it always has. A RESTful Web Interface is an interface that works in the REST Architectural Style; the way the web wants to work.