TCP proxy with netcat

From Noah.org
Revision as of 02:15, 1 June 2014 by Root (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search


This article shows uses of netcat to demonstrate a few simple proxies.


This is a simple proxy for HTTP. This is a transparent proxy. It does not follow the HTTP/1.1 CONNECT method spec for proxies. It just bounces lines of text back and forth. Many protocols will not work properly when treated like this. True web proxies work seamlessly with the web server and web browser to automatically and cleanly handle passing requests back and forth. With the following command you might be suspicious that this works because FIFOs are line oriented and there are no provisions for handling the socket. If any of the connection breaks this entire command pipeline will simply exit back to the shell.

mkfifo /tmp/fifo
nc -lk -p 8080 </tmp/fifo | nc www.noah.org 80 >/tmp/fifo

Note that this will not work on virtual web sites. Web servers use the Host request header field to determine which virtual web site to serve. If Host is not set correctly then the web server will return an error like this.

Site Temporarily Unavailable
We apologize for the inconvenience. Please contact the webmaster/ tech support immediately to have them rectify this.
error id: "bad_httpd_conf"

The simple transparent proxy is not smart enough to handle HTTP traffic. The following HTTP proxy will rewrite the Host: field in the HTTP request header to support virtual web sites. This version also adds logging of the client request and server response. Note that this does not rewrite HTML responses so the links in the web page will still point to the original web site, so subsequent requests made by clicking links in the web page will not go through the proxy connection.

mkfifo /tmp/fifo
nc -lk -p 8080 </tmp/fifo | sed -u -e 's/^Host.*/Host: www.noah.org/' | tee -a http_request.log | nc www.noah.org 80 | tee -a http_response.log >/tmp/fifo

This version attempts to do a very unsophisticated rewrite of the HTML so that subsequent requests will continue to come back through the proxy (note the URL is rewritten to the results of ${HOSTNAME}). The URL rewriting attempts to handle URLs with and without the www.' simply by assuming they both map to the same proxy. It also deletes request headers that would normally affect proxies. It deletes the Accept-Encoding request header to prevent compression of the response by the server (most web servers will gzip responses). It deliberately circumvents normal headers used to control proxy connections. So this is a improper HTTP proxy. It is also not very reliable. It tends to hang and get stuck or quit when either the client or server closes their end of the connection. I believe this is caused by the FIFO, which can not TCP control signals, so after a while the two sides get out of sync... At this point things are getting pretty sketchy, and it's amazing that this even works at all.

mkfifo /tmp/fifo
nc -q -1 -l -p 8080 </tmp/fifo \
    | sed -u -e "s/^Host:.*/Host: www.noah.org/" -e "/^Accept-Encoding:.*/d" -e "/^Connection:.*/d" -e "/^If-None-Match:.*/d" -e "/^If-Modified-Since:.*/d" \
    | tee -i -a http_request.log \
    | nc -q -1 www.noah.org 80 \
    | sed -u -e "s/www.noah.org/${HOSTNAME}:8080/ig" \
    | sed -u -e "s/noah.org/${HOSTNAME}:8080/ig" \
    | tee -i -a http_response.log >/tmp/fifo