Have you ever wondered how to easily download a full website to PDF or JPG? Of course, you can use “Print” functionality of your browser and then choose “Save as PDF” option or simply screenshot it. The situation gets a little bit more complicated if the website is very long or if you need to download multiple website/pages at once. Today we wanted to show you how we do it!
We will be using Linux shell and bash language to easily accomplish that goal.
Let us download 10 pages with incrementing numbers. For that purpose we will use the wkhtmltopdf library which you can install by running the following command on Debian based Linux systems:
sudo apt install wkhtmltopdf
After that you can create a file by running:
touch download.sh
and then
chmod +x download.sh
to make the file executable. Now open the file in your favorite editor and paste the following lines.
#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
wkhtmltopdf -s A4 --disable-smart-shrinking --zoom 1.0 $url output_file_$i.pdf
done;
Now you can save the file and then run it by executing:
./download.sh or sh download.sh
Now let us explain:
There is also –zoom parameter which you can increase if the PDF printout is not taking up entire A4 page.
Sometimes we are experiencing issues with the wkhtmltopdf library, for some websites we simply get a blank page as a result. The best solution that we came up with was to use “cutycapt” library. Here is how you can do it:
#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
cutycapt --min-width=1024 --min-height=1280 --zoom-factor=1.0 --url=$url --out=output_page_$i.png
done;
You might need to install cutycapt first, for Debian based system you can do it by running:
sudo apt install cutycapt
As you can see cutycapt command is very similar to wkhtmltopdf, also is having –zoom-factor parameter which you can increase to fill the entire page.
This is a very similar approach to approach number 1. You need to replace “wkhtmltopdf” command with “wkhtmltoimage”. Here is how the script would look like:
You might need to install the library by running
sudo apt install wkhtmltoimage
and then creating a file and pasting:
#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
wkhtmltoimage --width 900 --height 1280 --zoom 1.6 $url output_file_$i.pdf
done;
Similarly, we are downloading 10 pages and the output will be saved as a JPG file, we can specify JPG size by adjusting –width and –height parameters and –zoom
The title of the post was “How to save a full website to PDF” so you should be wondering now, what do I do with multiple PDF files?
We have an answer to that too!
Just open your terminal in the same folder where did you download pages and run the following command:
pdfunite $(ls -v *.pdf) output.pdf
pdfunite library is what we use to join multiple PDF files into 1. By adding ls -v you will make sure that files will be joined together in the right order 🙂
That’s it for now if you have any questions please leave a commend below.
SEO content, keyword selection... you probably hear these words everywhere in digital marketing. Everyone wants… Read More
If you own an online shop, you will certainly hear a lot about conversion and… Read More
You're starting to run an online shop with a few, several orders per day. You… Read More
Chances are that by now you have heard about the Coronavirus, also known as COVID-19.… Read More
Do you remember the little padlock that appears next to the website address? Look at… Read More
So your packages are on the way! Or maybe even delivered! Amazing. Now it is… Read More
This website uses cookies.