The Parser and Detector pages describe the main interfaces of Tika and how they work.
- Supported Document Formats
- HyperText Markup Language
- XML and derived formats
- Microsoft Office document formats
- OpenDocument Format
- iWorks document formats
- Portable Document Format
- Electronic Publication Format
- Rich Text Format
- Compression and packaging formats
- Text formats
- Feed and Syndication formats
- Help formats
- Audio formats
- Image formats
- Video formats
- Java class files and archives
- Source code
- Mail formats
- CAD formats
- Font formats
- Scientific formats
- Executable programs and libraries
- Crypto formats
- Database formats
- Full list of Supported Formats
" A Python port of the Apache Tika library that makes Tika available using the Tika REST Server."
https://github.com/chrismattmann/tika-python
No comments:
Post a Comment