End 2 end security in the world wide web

The IETF has recognized pervasive surveillance as an attack and documented it in RFC 7258 The IAB proposes to prefer encrypted protocols over cleartext operation in a Statement on Internet Confidentiality. For HTTP2 several browser vendors have stated that they will support it only over TLS, e.g. encrypted.

When talking to TLS people then TLS is giving end 2 end security, see for example Adam Langley's presentation at IETF90. TLS can be deployed hop-by-hop like in email or end-to-end like in HTTP. RFC 2817 "Upgrading to TLS Within HTTP/1.1" defines the Connect method which should be used to establish an end to end tunnel through proxies in case there is no direct connectivity. But does this give end-to-end security in the web?

Typing in a https URL into a browser would open a connection to the target, do TLS handshake, checking if the server certificate is valid and the server is really the target. Then the page is loaded over the secured connection. Here are the first glitches. Routing and load balancing may deliver the connection to different servers. This means the end point in the user domain is fixed, e.g. the browser, but the end point on the server side is quite flexible configurable by the content provider. One can argue however that companies will do this carefully and never leak secret keys to somebody else.

Load balancing often needs to be done based on the URL. To do this it is necessary to terminate the TLS connection at the load balancer and forward to the servers handling it. End users have no control of how the request is forwarded by load balancers or servers. So although usually only the server certificate is presented and checked is it the server part which is flexible assigned.

When the browser received the response from the server it becomes even more troublesome. A returned HTML page can contain a lot of subresources which are loaded by the browser one by one. The resource doesn't need to be served by the same server or company. The domain name may be different and when this happens additional TLS connection are opened. The server certificates for those connections are checked, but they have nothing to do with the name shown in the browser window. Even worse in the requests send over these connections the browser adds the URL which caused the resource to be loaded in the Referer header. In addition the browser adds cookies.

So when a secret service want to follow what content is transferred it is enough to ask a tracking company or even better found an own one. Then browsers will deliver the desired information secure to the secret service, e.g. other may not be able to read it. For the content which doesn't require authentication a simple HTTP request will return the cleartext content.

While content providers can work more or less without change over secured connections this isn't necessarily true for end users. For example virus scanners would need access to the cleartext content. Same for a cache. Similar parental control or ad blockers may want to access the URL to block blacklisted ones. Applications like this could easily be deployed by end users on home routers, but as the user end point for the TLS connection is fixed in the browser this will not work.

Using TLS seems to be a step into the right direction, but there are parts missing to provide more privacy and security.

See also HTTP2 and Free Speech