Tuesday, February 8, 2011

"pdftohtml" vs. DRM

A project of mine involves extracting strings and other details from PDF files using "pdftohtml -xml".

A plain "pdftohtml -xml" refuses to read PDF files with set copy-protection bits set. But if you add "-nodrm" on the command line, it reads them anyway, but it mentions the problem on STDERR.

No comments: