Skip to main content

HTTP Learning Notes 1_Basic Knowledge

Free2015-05-24#HTTP#http入门#http常识#http学习

The HTTP protocol is of utmost importance and needs to be studied carefully.

0. Additional Basic Knowledge

OSI (Open System Internetwork) Reference Model:

  1. Application Layer

Provides network communication interfaces for applications

  1. Presentation Layer

  2. Session Layer

  3. Transport Layer

Data transmission unit is message

  1. Network Layer

Data transmission unit is packet

  1. Data Link Layer

Data transmission unit is frame

  1. Physical Layer

Data transmission unit is bit

OSI 7 layers are a theoretical model for research. It's too cumbersome for practical application, so there's the TCP/IP reference model:

P.S. TCP/IP here refers to the protocol suite, or protocol family, containing various network protocols such as SNMP, ICMP, UDP, DNS, etc.

  1. Application Layer

Such as HTTP, FTP, DNS

  1. Transport Layer

Such as TCP, UDP

  1. Network Layer

Such as IP

  1. Data Link Layer

HTTP is an application layer protocol. Internally it uses TCP at the transport layer and IP at the network layer, so HTTP also requires three-way handshake and four-way wave.

Therefore, there is consumption in establishing/releasing connections. As a performance optimization, HTTP/1.1 defaults to persistent connections, while HTTP/1.0 does not.

In addition, it's necessary to simply distinguish some concepts:

  • Proxy

An application with forwarding functionality

  • Gateway

A server that forwards communication data from other servers

  • Tunnel

An application that can maintain communication connections between Client and Server

  • Tunnel

I. URI and URL

  • URI (Uniform Resource Identifier): A string used to identify specific internet resources

  • URL (Uniform Resource Locator): A string used to identify the location of specific internet resources

URL must include Scheme, Host, and URL Path

So URL is a subset of URI, or rather, URL is part of URI.

II. HTTP Methods

  • GET: Get resources

Supported in 1.0/1.1, commonly used

  • POST: Transmit entity body

Supported in 1.0/1.1, commonly used

  • PUT: Transmit files

Supported in 1.0/1.1. Generally websites don't use this method because PUT method itself doesn't have a verification mechanism. Anyone can upload files, which poses security issues.

  • HEAD: Get message header

Supported in 1.0/1.1, commonly used

  • DELETE: Delete files

Supported in 1.0/1.1, has the same problems as PUT method

  • OPTIONS: Ask about supported methods

Supported in 1.1

  • TRACE: Trace path

Supported in 1.1, used with Max-Forward header field to check path

  • CONNECT: Request to connect proxy using tunnel protocol

Supported in 1.1, used with SSL (Secure Sockets Layer) and TLS (Transport Layer Security) to achieve TCP communication through tunnels

  • (LINK): Establish connection with resources

Supported in 1.0, deprecated in 1.1

  • (UNLINK): Disconnect connection

Supported in 1.0, deprecated in 1.1

III. HTTP Status Codes

  • 1XX: Informational

Request is being processed

-  101: Switching Protocol, switch protocol, used with Upgrade header field to request protocol switch, not very common
  • 2XX: Success

Request processed normally

-  200: OK, request processed normally

-  204: No Content, request processed successfully but no resources to return (response body is empty)

-  206: Partial Content, request partial content, Content-Range in response header indicates range
  • 3XX: Redirection

Additional operations needed to complete request

-  301: Moved Permanently, permanent redirection, browser should automatically update bookmarks

-  302: Found, temporary redirection, don't update bookmarks

-  303: See Other, similar to 302, but requires using GET method to access new URL

    Note: Specification requires not changing request method when encountering 301, 302 (if originally POST, continue using POST), but *de facto standard* is almost all browsers switch to GET method when encountering 301, 302, 303, which doesn't conform to specification

-  304: Not Modified, resource found but doesn't meet request conditions (If-Match, If-Modified-Since, If-None-Match, If-Range, If-Unmodified-Since)

    Note: Although 304 belongs to 3XX, it *has nothing to do with redirection*
    
-  307: Temporary Redirection, temporary redirection, same meaning as 302

    Status code introduced to correct de facto standard, hoping to strictly follow 302 without changing method, but now browsers don't necessarily do this, standard failed again
  • 4XX: Client Error

Server cannot process request

-  400: Bad Request, request message has syntax errors

-  401: Unauthorized, indicates authentication needed (BASIC or DIGEST authentication) or authentication failed

-  403: Forbidden, access request to specified resource is rejected

-  404: Not Found, server cannot find requested resource, can replace 403 response when not wanting to explain reason

-  405: Method Not Allowed, method not supported, not very common

-  412: Precondition Failed, not very common

-  417: Expectation Failed, not very common
  • 5XX: Server Error

Server error processing request

  • 500: Internal Server Error, error occurred when server executing request

  • 503: Service Unavailable, server overloaded or under maintenance

IV. HTTP Message Header

HTTP Message Header = Message Header + Empty Line (CR+LF) + Message Body

             = Start Line (Request Line/Status Line) + Header + Empty Line + Message Body
             
             = Start Line + Request/Response Header Fields + General Header Fields + Entity Header Fields + Empty Line + Message Body

P.S. Don't underestimate this Empty Line (CR+LF). HTTP header injection attacks exist because of this empty line.

1. Start Line

Start line is divided into request line and status line (corresponding to HTTP request message and response message respectively):

  • Request Line: Explains the method, URI, and HTTP version used for request

  • Status Line: Explains the HTTP version and status code of returned response

2. Header Fields

Request Header Fields
Header Field NameDescription
AcceptMedia types user agent can handle
Accept-CharsetPreferred character set
Accept-EncodingPreferred content encoding
Accept-LanguagePreferred language (natural language)
AuthorizationWeb authentication information
ExpectExpect specific behavior from server
FromUser's email address
HostServer where requested resource is located
If-MatchCompare entity tag (ETag)
If-Modified-SinceCompare resource update time
If-None-MatchCompare entity tag (opposite of If-Match)
If-RangeSend entity Byte range request when resource not updated
If-Unmodified-SinceCompare resource update time (opposite of If-Modified-Since)
Max-ForwardsMaximum transmission hop count
Proxy-AuthorizationAuthentication information proxy server requires from client
RangeEntity byte range request
*Referer*Web authentication information
TETransfer encoding priority
User-AgentHTTP client program information
Response Header Fields
Header Field NameDescription
Accept-RangesWhether byte range requests are accepted
AgeEstimated time since resource creation
ETagResource matching information
LocationRedirect client to specified URI
Proxy-AuthenticateProxy server authentication information for client
Retry-AfterTiming requirement for initiating request again
ServerHTTP server installation information
VaryProxy server cache management information
WWW-AuthenticateServer authentication information for client
General Header Fields
Header Field NameDescription
Cache-ControlControl cache behavior
ConnectionPer-hop header, connection management
DateDate and time message created
PragmaMessage instructions
TrailerHeader list at message end
Transfer-EncodingSpecify transmission encoding method for message body
UpgradeUpgrade to other protocol
ViaProxy server related information
WarningError notification
Entity Header Fields
Header Field NameDescription
AllowHTTP methods resource supports
Content-EncodingEncoding method applicable to entity body
Content-LanguageNatural language of entity body
Content-LengthSize of entity body (unit: bytes)
Content-LocationURI replacing corresponding resource
Content-MD5Message digest of entity body
Content-RangePosition range of entity body
Content-TypeMedia type of entity body
ExpiresExpiration date and time of entity body
Last-ModifiedLast modification date and time of resource

P.S. For other header fields and more detailed header field information, please check Cnblogs: HTTP Message

References:

  • "Illustrated HTTP"

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment