代码之家 › 专栏 › 技术社区 › tthheemmaannii

为什么我的Docx转换器返回“无”

pywin32 ms-word pdf django python

tthheemmaannii · 技术社区 · 2 年前

我是网络开发的新手,正在尝试在Django中创建一个语言翻译应用程序来翻译上传的文档。它依赖于pdf和docx之间的一系列相互转换。当我的代码输出翻译后的文档时,它无法打开。

当我检查文件类型时,我发现它被识别为XML和docx,当我将扩展名更改为docx时,它可以被MS Word打开和读取(但任何PDF阅读器都无法读取)。
当我使用我的代码python通过打印类型及其内容来分析文件时,我得到了NoneType和None。
在mysite/mysite文件夹中可以找到该文件的工作PDF,但我的reConverter函数输出的发送到浏览器的文件是问题文件。
我尝试手动转换它使用:

wordObj = win32com.client.Dispatch('Word.Application')
docObj = wordObj.Documents.Open(wordFilename)
docObj.SaveAs(pdfFilename, FileFormat=wdFormatPDF)
docObj.Close()
wordObj.Quit()

但出现CoInitialization错误。我的原件因此,我已经将其完全缩小到返回NoneType的reConverter函数。这是我的代码:

from django.shortcuts import render
from django.http import HttpResponse
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_protect
from .models import TranslatedDocument
from .forms import UploadFileForm
from django.core.files.storage import FileSystemStorage
import docx
from pdf2docx import parse
from docx2pdf import convert
import time #remove


# Create your views here.

#pythoncom.CoInitialize()
@csrf_protect
def translateFile(request) :
    if request.method == 'POST':
        form = UploadFileForm(request.POST, request.FILES)
        if form.is_valid():
            uploaded_file = request.FILES['file']
            fs = FileSystemStorage()
            filename = fs.save(uploaded_file.name, uploaded_file)
            uploaded_file_path = fs.path(filename)

            file = (converter(uploaded_file_path))
        response = HttpResponse(file, content_type='application/pdf')
        response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
        return response
    
    else:
        form = UploadFileForm()
    return render(request, 'translator_upload/upload.html', {'form': form})


def reConverter(inputDocxPath):
    #reconvert docx to pdf
    
    print('reConverter: '+str(inputDocxPath))
    outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
    test = convert(inputDocxPath, outputPdfPath)
    print(type(test))
    print('test: '+str(test))
    return test

def translateDocx(aDocx, stringOfDocPath):
    #translation logic
    docx_file = stringOfDocPath
    myDoc = docx.Document(docx_file)
    print('translateDocx: '+str(docx_file))
    print('translateDocx: '+str(myDoc))
    for paragraphNum in range(len(myDoc.paragraphs)):

    #TRANSLATION LOGIC


    myDoc.save(docx_file)
    return reConverter(docx_file)


    
#stringOfDocPath is used as convert() requires file path, not file object(myDoc)

def converter(inputPdfPath):
    # convert pdf to docx
    
    pdf_file = inputPdfPath
    docx_file = inputPdfPath.replace('.pdf', '.docx')
    print('file types saved: '+docx_file+'. Converting to docx')


    parse(pdf_file, docx_file) #,  start=0, end=3)
    myDoc = docx.Document(docx_file)
    print('converter '+str(myDoc))
    return translateDocx(myDoc, docx_file)

0 回复 | 直到 2 年前

Melphin 2 年前

docx2pdf.convert始终返回“无”

转换后的pdf文件将保存到“outputPDF路径”文件中。

为了向用户显示pdf文件,您必须从“outputPDF路径”读取pdf文件。

def reConverter(inputDocxPath):
    #reconvert docx to pdf
    
    print('reConverter: '+str(inputDocxPath))
    outputPdfPath = inputDocxPath.replace('.docx', '.pdf')
    convert(inputDocxPath, outputPdfPath)
    with open(outputPdfPath, "r") as f:
        test = f.read()
    print(type(test))
    print('test: '+str(test))
    return test