Download Whole Website in Linux the Smart Way !!
Have you ever Googled the Internet for a software to Download complete website for you , but you only found a Windows software or maybe a Linux one too , but did you ever knew that your Linux box has a nifty command to make all your troubles go away and download a full website with just a single command , Yes ! wget does it and here is the command just copy paste it in the shell and edit the website details at the bottom .
$ wget \
–recursive \
–no-clobber \
–page-requisites \
–html-extension \
–convert-links \
–restrict-file-names=windows \
–domains techstroke.com \
–no-parent \
www.techstroke.com/Windows/
This command downloads the Web site www.techstroke.com/Windows/.
The options are:
- –recursive: download the entire Web site.
- –domains-techstroke.com: don’t follow links outside techstroke.com.
- –no-parent: don’t follow links outside the directory /Windows/.
- –page-requisites: get all the elements that compose the page (images, CSS and so on).
- –html-extension: save files with the .html extension.
- –convert-links: convert links so that they work locally, off-line.
- –restrict-file-names=windows: modify filenames so that they will work in Windows as well.
- –no-clobber: don’t overwrite any existing files (used in case the download is interrupted and
resumed).
All these options are uber cool and they download a perfect browsable copy with all images javascript and css intact !!
via [linuxJournal]
Hi, I tried that with a webpage. The one is https, but all was OK when using –no-check-certificate
I used “-r –no-check-certificate -page-requisites” only. The problem: The main css file is saved, and in it the other css files are linked with
@import url(bla.css);
@import url(blubb.css);
…
and none of these are saved at all. Is there a wget tweak that also makes wget save these files as well?
–page-requisites itself should be enough to make wget do it, but it seems it just won’t. Using wget 1.11.1
@Rava I dont think so , that we have such a tweak, it parses html files but not the CSS ones, I went through the wget manual and found nothing regarding this, you can google for some website copy tool for linux if you need this done .
Wow…
Thanks for sharing this … the best
Sweet!
Just snatched a site into Linux Mint 8. Had to replace the leading “-” characters with “–“, as in -recursive became –recursive, but then it worked great. Rusty old ex-tech’s request: can you make it a simple bash script? I suspect its just the above with variables, but I forget how to parse. Please don’t say RTFM; I lack the time this month, but could really use this.
Many thanks.
-Brad
please send me the code to download a whole website
when i tried wget-recursive it prompted no such command found
Aditya, the command is wget -recursive , ie. wget -recursive as recursive is a command line argument for wget command , that’s why it said no such command found . Also, just copy paste the code after the $ prompt , given in this post and replace the techstroke.com url in the end with the website you wish to download and it will be done !
As of wget 1.12, –html-extension is renamed to –adjust-extension
@Becry – Thanks for telling , that’ll help other people reading this article