Extract references (pdf, url, doi, arxiv) and metadata from a PDF. Optionally download all referenced PDFs and check for broken links.
Features
- Extract references and metadata from a given PDF
- Detects pdf, url, arxiv and doi references
- Fast, parallel download of all referenced PDFs
- Find broken hyperlinks (using the
-c
flag) (more) - Output as text or JSON (using the
-j
flag) - Extract the PDF text (using the
--text
flag) - Use as command-line tool or Python package
- Compatible with Python 2 and 3
- Works with local and online pdfs
No comments:
Post a Comment