Monday, November 8, 2021

newspaper

Newspaper is a Python 3 module for article scraping and curation.

Newspaper can extract and detect languages seamlessly. If no language is specified, Newspaper will attempt to auto detect a language.

The features include:

  • Multi-threaded article download framework
  • News url identification
  • Text extraction from html
  • Top image extraction from html
  • All image extraction from html
  • Keyword extraction from text
  • Summary extraction from text
  • Author extraction from text
  • Google trending terms extraction
  • Works in 10+ languages (English, Chinese, German, Arabic, ...)

https://github.com/codelucas/newspaper 

https://www.kdnuggets.com/2021/10/simple-text-scraping-parsing-processing-python-library.html 

No comments:

Post a Comment