OpenSearch API

OpenSearch is a protocol where a search service accepts certain URL parameters which specify the user query, starting position in the results, the number of results to return, etc; then responds with an XML document conforming to either RSS or Atom format.

URL Parameters

For TNH, the URL parameters are as follows:

Param Default Multi? Description
q   N query
n 10 N num hits
p 0 N position
s   Y site, default is all
h 1 N max hits per site, 0=all
i   Y index to search, default is all
c   Y collection to search, default is all
t   Y type: text/html, application/pdf, etc.

A core idea is to keep the parameter names terse, so that for the ones that can be specified multiple times, the URL is kept as short as possible.

User full-text query

Param Default Multi? Description
q   No query

This is the query that the user types into the search box on the HTML page. The query only applies to the following fields:

  • title
  • content
  • url

Furthermore, the user's query is "automagically" transformed into a low-level Lucene query that searches for all the query terms in those fields in various combinations, scoring more desirable combinations higher than others.

The idea is that it "just works". For example, if you search for Internet Archive, you should see results where both those words appear, ideally together, but possible some distance apart. The "automagic" query transformation does all this under the hood. See Don't phrase me, bro! for details.

Paging

Param Default Multi? Description
n 10 No num hits
p 0 No position

These two parameters are used for paging through the results and usually are not manipulated by the end user directly.

Site

Param Default Multi? Description
s   Yes site, default is all
h 1 No max hits per site, 0=all

These two parameters are often used in combination. The s parameter limits the search to specific sites, where the h parameter specifies the maximum number of hits to show from any one site.

Most of the time, users want to see results from all the sites, but a maximum of 1. That is the default.

However, if a user wants to see all the results from a certain site, or small collection of sites, one could do

?q=foo&s=site1.org&s=site2.net&h=0

This would limit the results to the two sites, showing all the hits from each.

Index and collection

Param Default Multi? Description
i   Yes index to search, default is all
c   Yes collection to search, default is all

These two parameters control what might seem independent, but in practice are intertwined.

If a deployment has only a single Lucene index, but that one index contains multiple collections, then the c parameter can limit results to one or more collections.

But, if a deployment uses different (on-disk) Lucene indexes to manage collections -- like Archive-It -- then the i parameter should be used to limit the search to just the desired index or indexes.

It's a potentially tricky situation for an end-user, but these should only be exposed through a nice UI/front-end that knows about the deployment and uses either c or i as is appropriate.

Lastly, these could be used together, with multiple collections spread across multiple indexes, but it could be difficult to keep track and manage the combination of the two parameters.

Content/document types

Param Default Multi? Description
t   Yes content type: text/html, application/pdf, etc.

This parameter limits the results to results with a type matching the given value. Multiple parameters can be given.

Example:

?q=foo&type=application/pdf&type=application/x-pdf

The OpenSearch service doesn't have any particular knowledge of content types. It's up to the person building the index to canonicalize mime types (assuming that's what you want).

XML Response

XML Namespaces

The OpenSearch specification declares an XML namespace for its extensions to RSS and Atom. Similarly, we declare a namespace for our extensions.

OpenSearch http://a9.com/-/spec/opensearchrss/1.0/
The New Hotness http://web.archive.org/-/spec/opensearchrss/1.0/

Example response snippet.

<rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:archive="http://web.archive.org/-/spec/opensearchrss/1.0/"> <channel> <title>texas</title> <description>texas</description> <link /> <opensearch:totalResults>8996205</opensearch:totalResults> <opensearch:startIndex>0</opensearch:startIndex> <opensearch:itemsPerPage>10</opensearch:itemsPerPage> <archive:query>texas</archive:query> <archive:index>414</archive:index> <archive:urlParams> <archive:param name="q" value="texas" /> <archive:param name="i" value="414" /> </archive:urlParams> <item> <title>Texas Musical Drama</title> <link>http://www.texas-show.com/</link> <archive:docId>8185585</archive:docId> <archive:score>2.815091</archive:score> <archive:site>www.texas-show.com</archive:site> <archive:length>30373</archive:length> <archive:type>text/html</archive:type> <archive:collection>414</archive:collection> <date>20090706012618</date> <description>&lt;B&gt;Texas&lt;/B&gt; Musical Drama Home...</description> </item> <item>...</item> <archive:responseTime>0.985</archive:responseTime> </channel> </rss>