Thursday, June 14, 2007

Download web pages in BASH

BASH is quite amazing. From Bhaskar V. Karambelkar comes Bash shell tricks and this gem (with some small corrections):

function headers()
{
    server=$1
    port=${2:-80}
    exec 5<>/dev/tcp/$server/$port
    echo -ne "HEAD / HTTP/1.0\r\nHost: $server:$port\r\n\r\n" >&5
    cat <&5
    exec 5<&-;
}

I work behind a corporate firewall, so I need web proxy settings. No problem, BASH is still amazing:

function webget()
{
    declare -a parts
    parts=($(echo $1 | tr / ' '))
    protocol=${parts[0]}
    server=${parts[1]}
    path=$(echo $1 | sed "s,$protocol//$server,,")

    exec 5<>/dev/tcp/$http_proxy_server/$http_proxy_port
    echo -ne "GET $path HTTP/1.0\r\nHost: $server\r\n\r\n" >&5
    cat <&5
    exec 5<&-;
}

Usage is obvious:

$ webget http://www.ccil.org/jargon/
# Out pops the top page for Jargon File Resources

UPDATE: Also useful for talking to SMTP (email) servers:

$ exec 5<>/dev/tcp/localhost/smtp
$ read -u 5 line
$ echo $line
220 my.full.host.name ESMTP ...
$ echo QUIT >&5
$ read -u 5 line
$ echo $line
221 2.0.0 my.full.host.name closing connection

This also shows that BASH knows to map the SMTP service to port 25.

3 comments:

noah said...

Very interesting, but what was wrong with wget?

Brian Oxley said...

That's easy to answer: nothing. I would not use BASH for my daily web page saving, but the technique is still of general utility.

WizBribe said...

Thanks a lot for this post. I was wondering how to test a remote WebServer with only a kernel and a shell. No wget, no Perl, no lynx on the box. Your 5 lines script did the trick!