Статьи

WebSocket не является ни Web, ни Socket

Предложение HTML 5 содержит много новых и интересных идей. В частности, мы будем обсуждать WebSocket на панели WebBuilder в Силиконовой долине (поэтому, пожалуйста, отправляйте свои вопросы ). Чтобы дать некоторую предысторию, давайте более внимательно посмотрим на WebSocket и рассмотрим его на предмет возможных небольших изменений в использовании HTTP (таких как сохранение соединений TCP, когда объекты «окна» различны).

По сути, WebSocket функционирует следующим образом: после инициирования HTTP-соединения клиент запрашивает HTTP «Upgrade: WebSocket», после чего базовое TCP-соединение используется для двунаправленной потоковой передачи сообщений UTF-8 с разделением 0xFF. Теперь мы можем взглянуть на специфику более подробно:

Использует ли WebSocket порты TCP 81 и 815?

Использование новых портов требует настройки брандмауэра и прокси-сервера, которым будут противостоять многие ИТ-администраторы. Более того, представляется, что эти порты предлагаются без консультации с IANA . Порты, по-видимому, доступны, но необходимо надлежащим образом проконсультироваться с IANA, прежде чем присваивать известные порты. Наиболее вероятным результатом для широкого использования является то, что WebSocket «обновит» порт 80, как показано ниже.

Как WebSocket использует HTTP-соединение через порт 80?

20 48 54 54 50 2f 31 2e  31 0d 0a 55 70 67 72 61
64 65 3a 20 57 65 62 53 6f 63 6b 65 74 0d 0a 43
6f 6e 6e 65 63 74 69 6f 6e 3a 20 55 70 67 72 61
64 65 0d 0a

Для документирования протокола, возможно, имеет смысл просто дать клиент-серверное взаимодействие в ASCII, а не указывать точную последовательность байтов, используемых для взаимодействия с удаленным HTTP-сервером для «Upgrade: WebSocket». Обратите внимание, что гибкость HTTP используется эффективно.

Соблюдает ли WebSocket ту же политику происхождения?

«Политика одинакового происхождения» является одним из краеугольных камней веб-безопасности. По сути, исполняемый контент страницы может устанавливать соединение только с сервером, с которого пользователь загрузил страницу. Многие из недавних взломов безопасности в сети (например, эксплойт в адресной книге gmail и перехваты кликов ) возникают из-за тонких сбоев в применении одного и того же источника. Неясно, предназначен ли WebSocket для следования политике того же источника или нет (условие отказа, когда URL-адрес не ссылается на исходный хост, не задокументировано), но для безопасности Интернета мы должны настаивать на том, чтобы эта политика оставалась на месте.

Ограничен ли WebSocket лимитом HTTP для двух соединений?

This does not appear to be specified. However, since the WebSocket protocol makes no use of metadata, chaos would ensue if a single connection was used to multiplex the traffic of different WebSocket instances. The most natural interpretation is that a new TCP socket is created for each JavaScript construction of a WebSocket object. Typical usage, such as for standalone Ajax components, would have a WebSocket created for each component on the page, potentially resulting in hundreds of connections to the server. Strangely enough, the two-connection limit is the only fundamental aspect that makes using HTTP for Ajax Push difficult, and if we had control over how XMLHttpRequest used the underlying TCP connections, we would be in much better shape. The most dramatic benefit (and greatest risk to scalability) of WebSocket must not be unspecified. Note that socket establishment is expensive, so providing a way to multiplex different endpoints of a protocol over a single connection (as HTTP can) is a useful optimization.

Can WebSocket read and write arbitrarily as with low-level socket APIs?

WebSocket communication is restricted to the WebSocket protocol (which includes the connection setup and the 0xFF-delineated UTF-8 messages). It is argued that this improves security because WebSocket clients are unlikely to be able to attack existing network services. However, if WebSocket becomes popular, the majority of internet-facing systems will have applications that are vulnerable to attack through their WebSocket interface. Is the short-term benefit worth the long-term loss in flexibility (especially considering that a variety of existing plugins allow low-level socket interaction with the originating host).

How does WebSocket delineate messages?

WebSocket framing terminates messages with 0xFF. This is efficient in terms of byte usage, but framing errors could easily occur due to stray binary data (and keep in mind that a framing failure is a critical failure in a protocol). Further, detecting such framing errors would not be obvious from inspecting the TCP stream. (In contrast, MIME framing is unambiguous and requires no internal escaping of binary messages.)

How are function call semantics implemented over WebSocket?

WebSocket enforces no relationship between messages sent and received; multiple messages may be received from the server subsequent to a client message being sent to the server. This is not necessarily a drawback of the protocol, it is simply important to keep in mind that the request/response structure familiar on the web with HTTP is not enforced by WebSocket.

Is WebSocket easy to implement?

On the surface, implementation is straightforward; however, it is important to note that writing can occur simultaneously at both ends of the connection. If both ends attempt to write an amount larger than their TCP output buffers, deadlock can occur. The point here is not that the protocol should be designed to avoid simultaneous writing (as with HTTP 1.0) — this is necessary to obtain the event-based interactivity we are after. The point is that WebSocket implementations added ad-hoc to many different applications would lead to problems; in other words, ease of implementation is not as important as correctness in the protocol.

Can we just upgrade HTTP?

So, it appears that one interpretation is that the greatest benefit of WebSocket is its unspecified behavior in terms of TCP connections. Are there simple things that we can do to improve HTTP for use with Ajax Push and Comet? After all, we want to make use of the framing and metadata features of HTTP, as well as benefit from its many standard and widely deployed implementations.

The first step is to allow HTTP to benefit (in a reasonable way) from what is unspecified connection behavior with WebSocket: if two JavaScript object contexts do not share a «window» object, they should not (by default) share TCP connections. This would allow multiple browser windows/tabs to open notification connections to the server without interference and without complex inter-window coordination for the purpose of sharing a single connection. This step requires no API or protocol changes.

The next step is to fully support HTTP 1.1 from the browser (specifically, pipelining). By calling enablePipelining(true) on an XMLHttpRequest object, multiple push notification requests could be sent to the server without waiting for one of the two TCP connections to be freed. When a notification was available for one of the requests, all intermediate requests would be unblocked with no-op responses. Again, this would allow more straightforward multi-window push implementations.

Finally, we should consider extensions to the HTTP protocol itself, since a flurry of no-op responses when many windows are open is not efficient. With the introduction of a RequestTag HTTP header, an HTTP response could be uniquely associated with a request (other than by virtue of its order in the queue). This would allow out-of-order responses to pipelined requests, and would make it possible to use HTTP in an event-driven fashion. Note that this is not just useful for notification-style applications; control over response ordering can reduce latency and server buffering requirements. With support for out-of-order responses, it would be desirable to have control over which TCP connection is used for a given request. This could be controlled through an optionally specified connection «name».