Newspaper is a Python 3 module for article scraping and curation.
Newspaper can extract and detect languages seamlessly. If no language is specified, Newspaper will attempt to auto detect a language.
The features include:
- Multi-threaded article download framework
- News url identification
- Text extraction from html
- Top image extraction from html
- All image extraction from html
- Keyword extraction from text
- Summary extraction from text
- Author extraction from text
- Google trending terms extraction
- Works in 10+ languages (English, Chinese, German, Arabic, ...)
https://github.com/codelucas/newspaper
https://www.kdnuggets.com/2021/10/simple-text-scraping-parsing-processing-python-library.html
No comments:
Post a Comment