Many IT administrators use proxies extensively in their networks, however the concept or reverse proxying is slightly less common. So what is a reverse proxy? Well, it refers to the setup where a proxy server like this is run in such a way that it appears to clients just like a normal web server.
Specifically, the client will connect directly to the proxy considering it to be the final destination i.e. the web server itself, they will not be aware that the requests could be relayed further to another server. It is possible that this will even be an additional proxy server. These ‘reverse proxy servers’ are also often referred to as gateways although this term can have different meanings too. To avoid confusion we’ll avoid that description in this article.
In reality the word ‘reverse’ refers to the backward role of the proxy server. In a standard proxy, the server will act as a proxy for the client initially. Any request by the proxy is made on behalf of the received client request. This is not the case in the ‘reverse’ scenario because because it acts as a proxy for the web server and not the client. This distinction can look quite confusing, as in effect the proxy will forward and receive requests to both the client and server however the distinction is important. You can read RFC 3040 for further information on this branch of internet replication and caching.
A standard proxy is pretty much dedicated to the client’s needs, all configured clients will forward all their requests for web pages to the proxy server. In a standard network architecture they will normally sit fairly close to the clients in order to reduce latency and network traffic. These proxies are also normally run by the organisations themselves although some ISPs will offer the service to larger clients.
In the situation of a reverse proxy, it is representing one or a small number of origin servers. You cannot normally access random servers through a reverse proxy because it has to be configured to specifically access certain web servers. Often these servers will need to be highly available and the caching aspect is important, a large organisation like Netflix would probably have specific IP addresses (read this) pointing at reverse proxies. The list of servers that are accessible should always be available from the reverse proxy server itself. A reverse proxy will normally be used by ‘all clients’ to specifically access certain web resources, indeed access may be completely blocked by any other route.
Obviously in this scenario it is usual for the reverse proxy to be both controlled and administered by the owner of the origin web server. This is because these servers are used for two primary purposes to replicate content across a wide geographic area and two replicate content for load balancing. In some scenarios it’s also used to add an extra layer of security and authentication to accessing a secure web server too.