Tips&Tricks

How to save a full website to PDF or JPG using Bash

Have you ever wondered how to easily download a full website to PDF or JPG? Of course, you can use “Print” functionality of your browser and then choose “Save as PDF” option or simply screenshot it. The situation gets a little bit more complicated if the website is very long or if you need to download multiple website/pages at once. Today we wanted to show you how we do it!

We will be using Linux shell and bash language to easily accomplish that goal.

How to save a website to PDF – approach 1

Let us download 10 pages with incrementing numbers. For that purpose we will use the wkhtmltopdf library which you can install by running the following command on Debian based Linux systems:

sudo apt install wkhtmltopdf

After that you can create a file by running:

touch download.sh

and then

chmod +x download.sh

to make the file executable. Now open the file in your favorite editor and paste the following lines.

#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
wkhtmltopdf -s A4 --disable-smart-shrinking --zoom 1.0 $url output_file_$i.pdf
done;

Now you can save the file and then run it by executing:

./download.sh or sh download.sh

Now let us explain:

  1. #!/bin/bash is an opening line for Bash script
  2. next, we start a loop which will run 10 times, each time replacing $i variable with incremented numbers.
  3. we declare a variable called “url” which you will have to replace with the address of the page that you want to download.
  4. and finally we download the entire pages to a PDF files, those will be named output_file_1.pdf, output_file_2.pdf and so on.

There is also –zoom parameter which you can increase if the PDF printout is not taking up entire A4 page.

How to save a website to PDF – approach 2

Sometimes we are experiencing issues with the wkhtmltopdf library, for some websites we simply get a blank page as a result. The best solution that we came up with was to use “cutycapt” library. Here is how you can do it:

#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
cutycapt --min-width=1024 --min-height=1280 --zoom-factor=1.0 --url=$url --out=output_page_$i.png
done;

You might need to install cutycapt first, for Debian based system you can do it by running:

sudo apt install cutycapt

As you can see cutycapt command is very similar to wkhtmltopdf, also is having –zoom-factor parameter which you can increase to fill the entire page.

How to save a website to JPG

This is a very similar approach to approach number 1. You need to replace “wkhtmltopdf” command with “wkhtmltoimage”. Here is how the script would look like:

You might need to install the library by running

sudo apt install wkhtmltoimage

and then creating a file and pasting:

#!/bin/bash
for i in {1..10}
do
declare url=https://someurl.com/page_$i.xhtml
wkhtmltoimage --width 900 --height 1280 --zoom 1.6 $url output_file_$i.pdf
done;

Similarly, we are downloading 10 pages and the output will be saved as a JPG file, we can specify JPG size by adjusting –width and –height parameters and –zoom

The title of the post was “How to save a full website to PDF” so you should be wondering now, what do I do with multiple PDF files?

We have an answer to that too!

Just open your terminal in the same folder where did you download pages and run the following command:

pdfunite $(ls -v *.pdf) output.pdf

pdfunite library is what we use to join multiple PDF files into 1. By adding ls -v you will make sure that files will be joined together in the right order 🙂

That’s it for now if you have any questions please leave a commend below.

Share
Published by
Kamil

Recent Posts

SEO content: Keyword Selection for Content Marketing Success

SEO content, keyword selection... you probably hear these words everywhere in digital marketing. Everyone wants… Read More

5 years ago

A/B testing and conversion rate – what they are; with examples

If you own an online shop, you will certainly hear a lot about conversion and… Read More

5 years ago

How to track sales and automate order to cash processes in your online shop

You're starting to run an online shop with a few, several orders per day. You… Read More

5 years ago

How to promote your business if you don’t have a website?

Chances are that by now you have heard about the Coronavirus, also known as COVID-19.… Read More

5 years ago

How to get a SSL certificate for a website

Do you remember the little padlock that appears next to the website address? Look at… Read More

6 years ago

Real challenges of e-commerce – Part 3: Billing statements

So your packages are on the way! Or maybe even delivered! Amazing. Now it is… Read More

6 years ago

This website uses cookies.