Skip to content

Commit b2e635a

Browse files
committed
Improve grammar of draft spec
1 parent 7f7cd37 commit b2e635a

File tree

1 file changed

+5
-5
lines changed
  • request-body-canonicalization/latest

1 file changed

+5
-5
lines changed

request-body-canonicalization/latest/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Abstract
44

5-
Originally CDX files were only used to index web archives containing GET requests. As browser-based capture methods can record non-GET requests such as those generated by JavaScript, a way for CDX/CDXJ index records to differentiate based on request method and request body is needed. This document describes the mechanism used for encoding the request method and body in the CDX/CDXJ key by appending additional query parameters, as originally implemented by pywb.
5+
Originally, CDX files were only used to index web archives containing GET requests. As browser-based capture methods can record non-GET requests such as those generated by JavaScript, a way for CDX/CDXJ index records to differentiate based on request method and request body is needed. This document describes the mechanism used for encoding the request method and body in the CDX/CDXJ key by appending additional query parameters, as originally implemented by pywb.
66

77
## Conformance
88

@@ -23,18 +23,18 @@ The key words MAY and MUST in this document are to be interpreted as described i
2323

2424
Web archiving data is often stored in specialized formats, which include a full record of the HTTP network traffic as well as additional metadata. The archived data is often accessed via random-access, loading the appropriate chunks of data based on URLs requested by end users.
2525

26-
This specification is designed to describe how to store two key file formats used for web archives:
26+
Web archiving data is often stored in two key file formats:
2727

2828
1. WARC — A widely accepted [ISO standard][3] used by many institutions around the world for storing web archive data.
2929
2. WACZ — A new format [developed by Webrecorder][4] for packaging WARCs with other web archive data which supports random-access reads.
3030

3131
Both formats are 'composite' formats, containing smaller amounts of data interspersed with metadata. In the case of WARC, the format consists of concatenated records which are appended one after the other, eg. `cat A.warc B.warc > C.warc`. The WARCs may or may not be gzipped, in which case the result is a multi-member gzip.
3232

33-
WACZ files use the ZIP format which contains a specialized file and directory layout. ZIP is also a composite format, containing the raw (sometimes compressed) data as well as header data which contains the location files and directories within the ZIP file.
33+
WACZ files use the ZIP format, which contains a specialized file and directory layout. ZIP is also a composite format, containing the raw (sometimes compressed) data as well as header data which contains the location files and directories within the ZIP file.
3434

3535
## Web Archive Index Formats (CDX and CDXJ)
3636

37-
Web archive search and retrieval is frequently intermediated by index files of WARC data, in the CDX or CDXJ formats. WACZ files contain CDXJ indices, which may or may not be gzipped, within the ZIP file that comprises the WACZ.
37+
Web archive search and retrieval is frequently intermediated by index files of WARC data in the CDX or CDXJ formats. WACZ files contain CDXJ indices, which may or may not be gzipped, within the ZIP file that comprises the WACZ.
3838

3939
### CDX
4040

@@ -70,7 +70,7 @@ The JSON Block contains a serialized [JSON][7] object with newlines escaped so t
7070

7171
### Motivation
7272

73-
POST-canonicalization provides a standardized way of representing a non-GET HTTP request as a GET request for indexing and playback in web archives. The original HTTP request type as well as the encoded request body are appended to the original URL and included in CDX/CDXJ indices as the Searchable URL. This allows web archive playback engines to then reconstruct the original non-GET requests for use in playback with their original HTTP method and request body.
73+
Request body canonicalization provides a standardized way of representing a non-GET HTTP request as a GET request for indexing and playback in web archives. The original HTTP request type as well as the encoded request body are appended to the original URL and included in CDX/CDXJ indices as the Searchable URL. This allows web archive playback engines to then reconstruct the original non-GET requests for use in playback with their original HTTP method and request body.
7474

7575
### Encoding the request method
7676

0 commit comments

Comments
 (0)