OpenSearch API
OpenSearch is a protocol where a search service accepts certain URL parameters which specify the user query, starting position in the results, the number of results to return, etc; then responds with an XML document conforming to either RSS or Atom format.
URL Parameters
For TNH, the URL parameters are as follows:
| Param | Default | Multi? | Description |
|---|---|---|---|
| q | N | query | |
| n | 10 | N | num hits |
| p | 0 | N | position |
| s | Y | site, default is all | |
| h | 1 | N | max hits per site, 0=all |
| i | Y | index to search, default is all | |
| c | Y | collection to search, default is all | |
| t | Y | type: text/html, application/pdf, etc. |
A core idea is to keep the parameter names terse, so that for the ones that can be specified multiple times, the URL is kept as short as possible.
User full-text query
| Param | Default | Multi? | Description |
|---|---|---|---|
| q | No | query |
This is the query that the user types into the search box on the HTML page. The query only applies to the following fields:
- title
- content
- url
Furthermore, the user's query is "automagically" transformed into a low-level Lucene query that searches for all the query terms in those fields in various combinations, scoring more desirable combinations higher than others.
The idea is that it "just works". For example, if you search for Internet Archive, you should see results where both those words appear, ideally together, but possible some distance apart. The "automagic" query transformation does all this under the hood. See Don't phrase me, bro! for details.
Paging
| Param | Default | Multi? | Description |
|---|---|---|---|
| n | 10 | No | num hits |
| p | 0 | No | position |
These two parameters are used for paging through the results and usually are not manipulated by the end user directly.
Site
| Param | Default | Multi? | Description |
|---|---|---|---|
| s | Yes | site, default is all | |
| h | 1 | No | max hits per site, 0=all |
These two parameters are often used in combination. The s parameter limits the search to specific sites, where the h parameter specifies the maximum number of hits to show from any one site.
Most of the time, users want to see results from all the sites, but a maximum of 1. That is the default.
However, if a user wants to see all the results from a certain site, or small collection of sites, one could do
?q=foo&s=site1.org&s=site2.net&h=0
This would limit the results to the two sites, showing all the hits from each.
Index and collection
| Param | Default | Multi? | Description |
|---|---|---|---|
| i | Yes | index to search, default is all | |
| c | Yes | collection to search, default is all |
These two parameters control what might seem independent, but in practice are intertwined.
If a deployment has only a single Lucene index, but that one index contains multiple collections, then the c parameter can limit results to one or more collections.
But, if a deployment uses different (on-disk) Lucene indexes to manage collections -- like Archive-It -- then the i parameter should be used to limit the search to just the desired index or indexes.
It's a potentially tricky situation for an end-user, but these should only be exposed through a nice UI/front-end that knows about the deployment and uses either c or i as is appropriate.
Lastly, these could be used together, with multiple collections spread across multiple indexes, but it could be difficult to keep track and manage the combination of the two parameters.
Content/document types
| Param | Default | Multi? | Description |
|---|---|---|---|
| t | Yes | content type: text/html, application/pdf, etc. |
This parameter limits the results to results with a type matching the given value. Multiple parameters can be given.
Example:
?q=foo&type=application/pdf&type=application/x-pdf
The OpenSearch service doesn't have any particular knowledge of content types. It's up to the person building the index to canonicalize mime types (assuming that's what you want).
XML Response
XML Namespaces
The OpenSearch specification declares an XML namespace for its extensions to RSS and Atom. Similarly, we declare a namespace for our extensions.
| OpenSearch | http://a9.com/-/spec/opensearchrss/1.0/ |
| The New Hotness | http://web.archive.org/-/spec/opensearchrss/1.0/ |
