Extract references (pdf, url, doi, arxiv) and metadata from a PDF. Optionally download all referenced PDFs and check for broken links.
Features
- Extract references and metadata from a given PDF
- Detects pdf, url, arxiv and doi references
- Fast, parallel download of all referenced PDFs
- Find broken hyperlinks (using the
-cflag) (more) - Output as text or JSON (using the
-jflag) - Extract the PDF text (using the
--textflag) - Use as command-line tool or Python package
- Compatible with Python 2 and 3
- Works with local and online pdfs
No comments:
Post a Comment