Updated

How to Detect and Remove Embedded Malware in PDF Files

PDFs can carry viruses. Learn how attackers hide malicious code in documents and how to sanitize them.

A scanner scanning a PDF file and finding a red bug inside

How to Detect and Remove Embedded Malware in PDF Files

We trust PDFs. We open them without thinking. That’s why hackers love them. A PDF is not just an image; it’s a container that can hold JavaScript, attachments, and launch actions.

1. The Threat: JavaScript

Just like a website, a PDF can run JavaScript.

  • The Attack: You open the PDF. The script runs automatically (OpenAction). It tries to connect to a malicious server or exploit a vulnerability in your PDF reader (like Adobe Acrobat) to take over your PC.

2. The Threat: Launch Actions

A PDF can contain a command to “Launch /bin/sh” or “cmd.exe”. Modern readers will warn you (“This document is trying to run a program”), but users often click “Allow” out of habit.

3. Detection and Sanitization

How do you clean a PDF?

  • Flattening: Convert the PDF to images and back to PDF. This kills all scripts but makes the text unselectable.
  • Stripping: Use tools (like qpdf or specialized security libraries) to remove all /JS, /JavaScript, and /AA (Additional Actions) tags from the file structure.

Conclusion

Never trust an unknown PDF. If you are building an app that accepts PDF uploads, you must sanitize them before showing them to other users.

Safe generation. MergeCanvas generates clean, standard PDFs from your data, ensuring no malicious code is ever inserted.