|Astonish Results Blog|
httpmirror - Mirror Linux distributions using http
here is the deal...
You are at work. Your network admin uses restrictive firewall rules. You can't use rsync or ftp to create a local mirror of your favorite Linux distribution. Your only door to Linux mirrors is through http. What to do?
I took this as an opportunity to brush up on my rusty python scripting skills and cobbled together httpmirror.
httpmirror is a roughly made script that lets you mirror linux distributions. There are two main functionalities provided by httpmirror:
Recursively download all the files from a Linux http mirror.
Recursively download a list of all the files from a Linux http mirror.
Recursive downloading of files is experimental so use this if you have a very reliable connection and you feel adventurous. Having said that, resuming works here whereas it doesn't when you are only creating a list of files.
Alternatively you might want to only generate a list of files to download and then feed that list to your favorite download program. This is the recommended way.
To get a better picture of what you can do with httpmirror here is the output from httpmirror if you don't provide any command-line arguments
a@a:~/Desktop/python-to-go$ ./httpmirror.py Usage: httpmirror.py [options] Options: -h, --help show this help message and exit -x NOLIST a comma-separated list of words that should be avoided -m URL The base url. This will download all the files -l URLLIST The base url. This will only save the list of files -o FILE The file output should you decide to only download the file list
The -x option allows the use of exceptions. Oftentimes you need to exclude certain files/directories from being fetched. For example if you are only interested in the i386 arch distribution and want to exclude amd64, sparc, and powerpc you'd add the following '-x amd64,sparc,powerpc'
You MUST specify either -m with a url or -l with a url.
The -o option is optional and only useful if you want to generate a list of files to download and you want to specify a custom file name. The default is out.txt
To use a proxy server the 'http_proxy' environment variable must be set. (in linux you'd enter 'export http_proxy=http://myproxy:port').
Q. I can make wget do the same thing!
A. Sure you can but you won't be able to filter out files/directories using pre-set criteria.
Download and Run
To run the script in linux 'chmod +x ./httpmirror.py' and then run. In Windows you simply 'python httpmirror.py'
0.0.3 - 23-May-2008
* Exceptions are now optional
* minor bug fix (the os.linesp() issue)