httpmirror - Mirror Linux distributions using http

here is the deal...

You are at work. Your network admin uses restrictive firewall rules. You can't use rsync or ftp to create a local mirror of your favorite Linux distribution. Your only door to Linux mirrors is through http. What to do?

I took this as an opportunity to brush up on my rusty python scripting skills and cobbled together httpmirror.

httpmirror is a roughly made script that lets you mirror linux distributions. There are two main functionalities provided by httpmirror:

  1. Recursively download all the files from a Linux http mirror.

  2. Recursively download a list of all the files from a Linux http mirror.

Recursive downloading of files is experimental so use this if you have a very reliable connection and you feel adventurous. Having said that, resuming works here whereas it doesn't when you are only creating a list of files.

Alternatively you might want to only generate a list of files to download and then feed that list to your favorite download program. This is the recommended way.

To get a better picture of what you can do with httpmirror here is the output from httpmirror if you don't provide any command-line arguments

a@a:~/Desktop/python-to-go$ ./httpmirror.py
Usage: httpmirror.py [options]

Options:
-h, --help  show this help message and exit
-x NOLIST   a comma-separated list of words that should be avoided
-m URL      The base url. This will download all the files
-l URLLIST  The base url. This will only save the list of files
-o FILE     The file output should you decide to only download the file list
  • The -x option allows the use of exceptions. Oftentimes you need to exclude certain files/directories from being fetched. For example if you are only interested in the i386 arch distribution and want to exclude amd64, sparc, and powerpc you'd add the following '-x amd64,sparc,powerpc'

  • You MUST specify either -m with a url or -l with a url.

  • The -o option is optional and only useful if you want to generate a list of files to download and you want to specify a custom file name. The default is out.txt

  • To use a proxy server the 'http_proxy' environment variable must be set. (in linux you'd enter 'export http_proxy=http://myproxy:port').

FAQs:

Q. I can make wget do the same thing!

A. Sure you can but you won't be able to filter out files/directories using pre-set criteria. Plus I'm only doing this to refresh my python skills nothing more and nothing less :P

Download and Run

To run the script in linux 'chmod +x ./httpmirror.py' and then run. In Windows you simply 'python httpmirror.py'

Download httpmirror (0.0.2) 5.2 KB


Recent comments