我有一个python文件,可以将pdf文件转换为excel文件,该文件与python文件所在的文件夹相同。
这是代码:
import os
import pandas as pd
from PyPDF2 import PdfReader
from tabula.io import read_pdf
def pdf_to_excel(pdf_file):
# Use PdfReader to open the PDF file
with open(pdf_file, 'rb') as file:
pdf_reader = PdfReader(file)
# Initialize an empty list to store DataFrames
all_tables = []
# Iterate through pages and extract tables
for page_number in range(len(pdf_reader.pages)):
# Check if the page contains tables using tabula
tables = read_pdf(pdf_file, pages=page_number+1, multiple_tables=True)
# Convert each table to a DataFrame and append to the list
for table in tables:
df = pd.DataFrame(table)
all_tables.append(df)
# Concatenate all DataFrames into a single DataFrame
final_df = pd.concat(all_tables, ignore_index=True)
# Save the final DataFrame to a single Excel file
excel_file = os.path.splitext(pdf_file)[0] + "_merged_tables.xlsx"
final_df.to_excel(excel_file, index=False)
print(f"All tables merged and saved to {excel_file}")
if __name__ == "__main__":
# Get the list of PDF files in the same folder
pdf_files = [file for file in os.listdir() if file.lower().endswith('.pdf')]
if not pdf_files:
print("No PDF files found in the folder.")
else:
for pdf_file in pdf_files:
pdf_to_excel(pdf_file)
input("Press Enter to exit...")
我已经将其转换为exe文件,因为客户端的电脑中没有安装python,所以我使用了pyinstaller:
pip安装程序
pyinstaller--一个文件代码.py
但当我运行exe文件时,它显示了一个错误:
Error importing jpype dependencies. Fallback to subprocess.
No module named 'technology'
Error from tabula-java:
Error: Unable to access jarfile C:\Users\achak\AppData\Local\Temp\_MEI137762\tabula\tabula-1.0.5-jar-with-dependencies.jar
Traceback (most recent call last):
File "code.py", line 40, in <module>
File "code.py", line 17, in pdf_to_excel
File "tabula\io.py", line 395, in read_pdf
File "tabula\io.py", line 82, in _run
File "tabula\backend.py", line 108, in call_tabula_java
File "subprocess.py", line 571, in run
subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'C:\\Users\\achak\\AppData\\Local\\Temp\\_MEI137762\\tabula\\tabula-1.0.5-jar-with-dependencies.jar', '--pages', '1', '--guess', '--format', 'JSON', 'file.pdf']' returned non-zero exit status 1.
[13140] Failed to execute script 'code' due to unhandled exception!