Wayback API

Request types within Wayback

  • Capture Requests - returns information about the various captures of a specific URL
  • URL Requests - returns information about URLs captured that begin with a particular prefix
  • Replay Requests - returns a specific resource from the archive based on a URL plus a date

Capture Request URL format

http://wwwoh-access.archive.org:8080/wwwoh/xmlquery?type=urlquery&url={URL}&startdate={DATE}&enddate={DATE}
  • url - the URL for which data should be returned. Ex. http://www.yahoo.com/
  • startdate (optional) - the earliest date boundary for which data should be returned. Partial timestamps(see below) are assumed to mean the earliest possible date given the partial Timestamp.
  • enddate (optional) - the latest date boundary for which data should be returned. Partial timestamps(see below) are assumed to mean the latest possible date given the partial Timestamp.
<wayback>
  <request>
    <resultsrequested>1000</resultsrequested>
    <startdate>19960101000000</startdate>

    <numresults>2</numresults>
    <type>urlquery</type>
    <enddate>20090605003233</enddate>
    <firstreturned>0</firstreturned>

    <url>enigmahistory.org/</url>
    <numreturned>2</numreturned>
    <resultstype>resultstypecapture</resultstype>
  </request>

  <results>
    <result>
      <capturedate>20020805101003</capturedate>
      <file>IA-WORLDWARS-ia400119.20080802024501.arc.gz</file>
      <urlkey>enigmahistory.org/</urlkey>

      <redirecturl>-</redirecturl>
      <url>http://www.enigmahistory.org:80/</url>
      <digest>54JBFQDKPNUQNUKYI4QT6DECP22VESEQ</digest>
      <compressedoffset>67083729</compressedoffset>

      <httpresponsecode>200</httpresponsecode>
      <mimetype>text/html</mimetype>
    </result>
    <result>
      <capturedate>20021128012534</capturedate>

      <file>IA-WORLDWARS-ia400119.20080802045222.arc.gz</file>
      <urlkey>enigmahistory.org/</urlkey>
      <redirecturl>-</redirecturl>
      <url>http://www.enigmahistory.org:80/</url>

      <digest>54JBFQDKPNUQNUKYI4QT6DECP22VESEQ</digest>
      <compressedoffset>17656542</compressedoffset>
      <httpresponsecode>200</httpresponsecode>
      <mimetype>text/html</mimetype>

    </result>
  </results>
</wayback>
  • wayback.request.url - canonicalized lookup version of requested URL (see Canonicalization below)
  • wayback.request.firstreturned - in paginated responses, record number of first result returned, zero-based
  • wayback.request.enddate - end date boundary of request, or end of current year if omitted
  • wayback.request.resultstype - string literal "resultstypecapture"
  • wayback.request.resultsrequested - maximum number of records to return in a single request
  • wayback.request.numresults - total number of results matching the query
  • wayback.request.type - string literal "urlquery"
  • wayback.request.startdate - start date boundary of request, or default for Wayback installation if omitted
  • wayback.request.numreturned - number of actual results returned in response
  • wayback.results.result.url - as close of a representation as can be made of the original request url using data only from the index
  • wayback.results.result.file - name of ARC/WARC file holding this resource
  • wayback.results.result.httpresponsecode - servers HTTP response code to the original request
  • wayback.results.result.digest - MD5 or SHA1 digest of the HTTP payload of this resource
  • wayback.results.result.capturedate - 14-digit timestamp when this resource was captured
  • wayback.results.result.urlkey - canonicalized version of the original capture URL
  • wayback.results.result.compressedoffset - offset within arcfile where this capture begins
  • wayback.results.result.mimetype - MIME Type of capture, as reported by servers HTTP response headers
  • wayback.results.result.redirecturl - URL which this capture redirects to, or "-" if it does not redirect

Live web example:

http://wwwoh-access.archive.org:8080/wwwoh/xmlquery?type=urlquery&url=http://www.enigmahistory.org/

URL Request URL format

http://wwwoh-access.archive.org:8080/wwwoh/xmlquery?type=prefixquery&url={URL}&startdate={DATE}&enddate={DATE}

  • url - the URL for which data should be returned. Ex. http://www.yahoo.com/
  • startdate (optional) - the earliest date boundary for which data should be returned. Partial timestamps(see below) are assumed to mean the earliest possible date given the partial Timestamp.
  • enddate (optional) - the latest date boundary for which data should be returned. Partial timestamps(see below) are assumed to mean the latest possible date given the partial Timestamp.
<wayback>
  <request>
    <resultsrequested>1000</resultsrequested>
    <startdate>19960101000000</startdate>
    <numresults>53</numresults>

    <type>prefixquery</type>
    <enddate>20090605004205</enddate>
    <firstreturned>0</firstreturned>
    <url>enigmahistory.org/</url>

    <numreturned>53</numreturned>
    <resultstype>resultstypeurl</resultstype>
  </request>
  <results>
    <result>

      <numcaptures>35</numcaptures>
      <lastcapturets>20070814013921</lastcapturets>
      <numversions>1</numversions>
      <firstcapturets>20020805101003</firstcapturets>

      <urlkey>enigmahistory.org/</urlkey>
      <originalurl>http://www.enigmahistory.org:80/</originalurl>
    </result>
    <result>
      <numcaptures>18</numcaptures>

      <lastcapturets>20080108235127</lastcapturets>
      <numversions>1</numversions>
      <firstcapturets>20021212183427</firstcapturets>
      <urlkey>enigmahistory.org/booksreviews.html</urlkey>

      <originalurl>http://www.enigmahistory.org:80/booksreviews.html</originalurl>
    </result>
    ...
  </results>
</wayback>
  • wayback.request.* - same as urlquery definition, except:
  • wayback.request.type - string literal "prefixquery"
  • wayback.request.resultstype - string literal "resultstypeurl"
  • wayback.results.result.result.urlkey - canonicalized version of the original capture URL
  • wayback.results.result.result.numversions - number of unique digests across all captures of this URL
  • wayback.results.result.result.numcaptures - total number of captures of this URL
  • wayback.results.result.result.firstcapturets - timestamp of first capture of this URL within requested date boundaries
  • wayback.results.result.result.originalurl - as close of a representation as can be made of the original request url using data only from the index
  • wayback.results.result.result.lastcapturets - timestamp of last capture of this URL within requested date boundaries

Live web example:

http://wwwoh-access.archive.org:8080/wwwoh/xmlquery?type=prefixquery&url=http://www.enigmahistory.org/

Replay Request URL format

http://wwwoh-access.archive.org:8080/wwwoh/replay?url={URL}&date={DATE}
  • url - the URL which should be replayed. Ex. http://www.yahoo.com/
  • date - the capture date specifying the particular version of URL to be returned, as specified as a timestamp. Partial timestamps are interpreted as the earliest capture.

If the specified date does not exactly match a capture date for the URL, the client will be redirected to the closest date that was actually captured.

Documents returned may be altered depending on the configuration of the wayback installation. In some cases, no modifications at all are performed on the resource before returning, in others, HTTP headers may be altered, or HTML content may be altered, to enhance in-browser replay experience.

Live web example:

http://wwwoh-access.archive.org:8080/wwwoh/replay?date=20070814013921&url=http://www.enigmahistory.org/

Timestamp format

Timestamps are a 14 digit representation of a specific second in time, represented in UTC:

YYYYMMDDHHmmss
  • YYYY - Year. ex. 1999, 2004.
  • MM - month, 01 = Jan, 12 = Dec
  • DD - day of month, 1 based.
  • HH - hour of day, 0 based. 01 = 1 AM, 13 = 1 PM
  • mm - minute of hour, 0 based.
  • ss - second of minute, 0 based.

If a timestamp is represented as less than 14 digits, it will be interpreted as either the earliest or latest possible moment, depending on the context. The timestamp "1999" interpreted as the earliest date becomes "19990101000000". The timestamp "1999" interpreted as the latest date becomes "19991231235959".

URL Canonicalization

Wayback performs several URL normalization, or canonicalization operations on URLs before they are inserted into the Wayback index. The same operations are performed on URLs before searching the Wayback index. Examples, of canonicalization operations are:

  • removal of leading "www." from hostnames
  • lowercasing host and/or path
  • collapsing redundant path components. ex, "/images/../foo.gif" = "/foo.gif", "/./foo.gif" = "/foo.gif"