Website copiers: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
* You can specify a link depth level (compared to HTTrack, it is one less, eg a link level of 1 will already fetch the linked URLs, which is much more logical). | * You can specify a link depth level (compared to HTTrack, it is one less, eg a link level of 1 will already fetch the linked URLs, which is much more logical). | ||
* If you need to disable robot exclusion (robots.txt and others), use the -e robots=off command line option. | * If you need to disable robot exclusion (robots.txt and others), use the -e robots=off command line option. | ||
* | |||
== Limitations == | |||
* You cannot rewrite URLs to an hardcoded location. | |||
* You cannot rename the main page downloaded to another name. | |||
= HTTrack = | = HTTrack = |
Revision as of 17:10, 26 January 2010
All in all, I find wget better (more intuitive), even if it seems less configurable.
wget
- wget can act has a powerful mirroring tool. Use it like this:
wget -p -k -H -nH -nd -E www.apple.com
- The -p option will download dependent files like images or CSS, the -k activate link rewriting, -E activates file renaming (like moving a .php file to an .html), and -H means you can download from other hosts than the original one.
- -nH and -nd are minor options affecting how directories are created.
- You can specify a link depth level (compared to HTTrack, it is one less, eg a link level of 1 will already fetch the linked URLs, which is much more logical).
- If you need to disable robot exclusion (robots.txt and others), use the -e robots=off command line option.
Limitations
- You cannot rewrite URLs to an hardcoded location.
- You cannot rename the main page downloaded to another name.
HTTrack
- This software seems useful but is quite complex and not very intuitive. Some important optios:
- -n: this will activate fetches for related elements like CSS files and images (same as -p on wget). However it won't activate rewriting on those elements unless you have a higher link depth level, which make it less poerful than wget.
- -e: similar to -H on wget.
- -r10: this specifies the link depth level (10 in this example). Note that it starts at 2 and not one.
- You can change the original directory hierarchy structure with some options.