TIKA文件格式 - Tika教程
Tika支持的文件格式
下面的表显示了Tika支持的文件格式。
| 文件格式 | 类库 | Tika中的类 | 
|---|---|---|
| XML | org.apache.tika.parser.xml | XMLParser | 
| HTML | org.apache.tika.parser.htmll and it uses Tagsoup Library | HtmlParser | 
| MS-Office compound document Ole2 till 2007 ooxml 2007 onwards | org.apache.tika.parser.microsoftorg.apache.tika.parser.microsoft.ooxml and it uses Apache Poi library | OfficeParser(ole2)OOXMLParser(ooxml) | 
| OpenDocument Format openoffice | org.apache.tika.parser.odf | OpenOfficeParser | 
| portable Document Format(PDF) | org.apache.tika.parser.pdf and this package uses Apache PdfBox library | PDFParser | 
| Electronic Publication Format (digital books) | org.apache.tika.parser.epub | EpubParser | 
| Rich Text format | org.apache.tika.parser.rtf | RTFParser | 
| Compression and packaging formats | org.apache.tika.parser.pkg and this package uses Common compress library | PackageParser and CompressorParser and its sub-classes | 
| Text format | org.apache.tika.parser.txt | TXTParser | 
| Feed and syndication formats | org.apache.tika.parser.feed | FeedParser | 
| Audio formats | org.apache.tika.parser.audio and org.apache.tika.parser.mp3 | AudioParser MidiParser Mp3- for mp3parser | 
| Imageparsers | org.apache.tika.parser.jpeg | JpegParser-for jpeg images | 
| Videoformats | org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats | Mp4parser FlvParser | 
| java class files and jar files | org.apache.tika.parser.asm | ClassParser CompressorParser | 
| Mobxformat (email messages) | org.apache.tika.parser.mbox | MobXParser | 
| Cad formats | org.apache.tika.parser.dwg | DWGParser | 
| FontFormats | org.apache.tika.parser.font | TrueTypeParser | 
| executable programs and libraries | org.apache.tika.parser.executable | ExecutableParser |