300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程

Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程

时间:2023-09-12 08:44:07

相关推荐

Java实现pdf转HTML | WORD | EXCEL | PPT | PNG | TXT 教程

从 Maven 下载 Aspose.PDF

通过将以下配置添加到 pom.xml, 您可以直接从基于Maven的项目 轻松地使用Aspose.PDFfor Java 。

<repository><id>AsposeJavaAPI</id><name>Aspose Java API</name><url>/repo/</url></repository>

<dependency><groupId>com.aspose</groupId><artifactId>aspose-pdf</artifactId><version>22.4</version></dependency>

核心代码实现(单类)

import com.aspose.pdf.Document;import com.aspose.pdf.SaveFormat;import com.aspose.pdf.TextAbsorber;import com.aspose.pdf.devices.PngDevice;import com.aspose.pdf.devices.Resolution;import java.io.*;public class PDFHelper3 {public static void main(String[] args) throws IOException {pdf2txt("C:\\Users\\liuya\\Desktop\\pdf\\示例文件.pdf");}//转wordpublic static void pdf2word(String pdfPath) {long old = System.currentTimeMillis();try {String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".docx";FileOutputStream os = new FileOutputStream(wordPath);Document doc = new Document(pdfPath);doc.save(os, SaveFormat.DocX);os.close();//转化用时long now = System.currentTimeMillis();System.out.println("Pdf 转 Word 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (Exception e) {System.out.println("Pdf 转 Word 失败...");e.printStackTrace();}}//转pptpublic static void pdf2ppt(String pdfPath) {long old = System.currentTimeMillis();try {//新建一个word文档String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".pptx";FileOutputStream os = new FileOutputStream(wordPath);//doc是将要被转化的word文档Document doc = new Document(pdfPath);//全面支持DOC, DOCX, OOXML, RTF HTML, OpenDocument, PDF, EPUB, XPS, SWF 相互转换doc.save(os, SaveFormat.Pptx);os.close();//转化用时long now = System.currentTimeMillis();System.out.println("Pdf 转 PPT 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (Exception e) {System.out.println("Pdf 转 PPT 失败...");e.printStackTrace();}}//转excelpublic static void pdf2excel(String pdfPath) {long old = System.currentTimeMillis();try {String wordPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".xlsx";FileOutputStream os = new FileOutputStream(wordPath);Document doc = new Document(pdfPath);doc.save(os, SaveFormat.Excel);os.close();long now = System.currentTimeMillis();System.out.println("Pdf 转 EXCEL 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (Exception e) {System.out.println("Pdf 转 EXCEL 失败...");e.printStackTrace();}}//转htmlpublic static void pdf2Html(String pdfPath) {long old = System.currentTimeMillis();try {String htmlPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".html";Document doc = new Document(pdfPath);doc.save(htmlPath, SaveFormat.Html);long now = System.currentTimeMillis();System.out.println("Pdf 转 HTML 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (Exception e) {System.out.println("Pdf 转 HTML 失败...");e.printStackTrace();}}//转图片public static void pdf2image(String pdfPath) {long old = System.currentTimeMillis();try {Resolution resolution = new Resolution(300);String dataDir = pdfPath.substring(0, pdfPath.lastIndexOf("."));File imageDir = new File(dataDir + "_images");if (!imageDir.exists()) {imageDir.mkdirs();}Document doc = new Document(pdfPath);PngDevice pngDevice = new PngDevice(resolution);for (int pageCount = 1; pageCount <= doc.getPages().size(); pageCount++) {OutputStream imageStream = new FileOutputStream(imageDir + "/" + pageCount + ".png");pngDevice.process(doc.getPages().get_Item(pageCount), imageStream);imageStream.close();}long now = System.currentTimeMillis();System.out.println("Pdf 转 PNG 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (Exception e) {System.out.println("Pdf 转 PNG 失败...");e.printStackTrace();}}//转txtpublic static void pdf2txt(String pdfPath) {long old = System.currentTimeMillis();Document pdfDocument = new Document(pdfPath);TextAbsorber ta = new TextAbsorber();ta.visit(pdfDocument);String txtPath = pdfPath.substring(0, pdfPath.lastIndexOf(".")) + ".txt";try {BufferedWriter writer = new BufferedWriter(new FileWriter(txtPath));writer.write(ta.getText());writer.close();long now = System.currentTimeMillis();System.out.println("Pdf 转 TXT 共耗时:" + ((now - old) / 1000.0) + "秒");} catch (IOException e) {System.out.println("Pdf 转 TXT 失败...");e.printStackTrace();}}}

运行方法,idea里右键运行,如果要做成web系统可以将代码封装程web服务,调用方法就行。

转换文件结果

以一个十四的pdf文件转化为例,大部分转换时间在10-12s,只有转ppt花费的时间久一点需要20s.可能pdf里面不是表格类的内容,所以转换excel文件后,样式差别会有点大,其他文件转换后样式和之前是保持一样的。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。