Questions tagged [pymupdf]
PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.
pymupdf
346
questions
0
votes
0
answers
21
views
Can't get form elements to align properly using pymupdf and reportlab
I'm writing a program that takes form files and turns them into interactive PDF forms by using markers to locate and draw form elements.
I've gotten my radio buttons to align on the page properly but ...
0
votes
0
answers
16
views
DocumentAI OCR Error: Invalid Document Content
I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error:
{
"caughtError": {
"message": "...
0
votes
0
answers
10
views
Match text extracted with unstructured to parsing from PyMuPDF
I am parsing PDFs with unstructured for use in an application. Later i would like to highlight some of these extracted text passaged in the actual pdf. I have seen that pymupdf has the capabilities to ...
0
votes
1
answer
32
views
PyMuPDF page.search_for(text) splits a line break into two completely different objects
I use PyMuPDF for document redaction. Sometimes I want to redact only one instance of a certain text on a page, so I use an index:
areas = page.search_for(text)
area = areas[index]
The problem is ...
0
votes
1
answer
63
views
I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?
I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?
import fitz
doc = fitz.open("2303.11366v4.pdf")# download from ...
0
votes
0
answers
48
views
How to resolve cx_Freeze error about pdf2docx mupdf import?
I've never posted here before, but this time I'm genuinely stuck. I'm using cx_Freeze on a python script that uses pdf2docx. Running the python script normally works fine, but running cx_Freeze's ...
-1
votes
1
answer
40
views
Extracting Text from PDFs with Python Without Including Comments
I have been trying to extract text from PDF files to automate a significant and tedious part of my job using Python. With the help of ChatGPT, I have written multiple lines of code. However, I am ...
1
vote
2
answers
68
views
Handling malformed PDF with MuPDF
I am using MuPDF to read some PDF documents, and recently I have started getting some malformed PDFs coming from Google GSuite where the hex header in the beginning has some additional bytes added in ...
0
votes
0
answers
79
views
How do I replace embedded fonts in PDF
I have a use case where I extract embedded fonts from PDF to modify them to add unicode mapping and I want to put them back.
I tried several approaches and everything fails.
Last attempt was using ...
0
votes
0
answers
24
views
Delete all footnote/endnote numbers throughout a pdf using Pymupdf?
I am trying to delete all footnotes/endnotes in a pdf. They all have a font size of six, while the rest of the text has a font size of twelve.
Example text:
"The Industrial Revolution marked a ...
0
votes
0
answers
59
views
Why is the zero point of the coordinate system (0, 0) for the pymupdf library sometimes start from the upper right corner?
Using the doctr library, I recognize text on a PDF file. From the entire text, I select keywords and the coordinates of these words. I receive the coordinates in the following format:
...
0
votes
0
answers
156
views
Compare two pdfs for the same figure
The essence of my problem: I have a lot of incoming PDFs, but for work I don't need all of them, only PDFs from a certain supplier. I defined keywords and coordinates of keywords that only this ...
0
votes
0
answers
41
views
paste pdf image with correct width and height and position to svg file in python using fitz
`i am trying to convert pdf file to svg with correct formating
i am using the fitz pymupdf library
the text is formatted correctly but i cant adjust the image
"this is my first time working with ...
0
votes
1
answer
52
views
Overlaying two PDFs with an alpha mask at 50%
I'm trying to recovers notes I took on an iPad over a PDF, that I saved as a new PDF before the application crashed. This new PDF is corrupted, but I could repair it so that it contains all my notes (...
0
votes
2
answers
36
views
How to differentiate between repeated fields in a pdf using PyMuPDF
I have a pdf where there are some fields like this:
( ) I do not have tax residency outside Brazil.
Today I can fill this type with an X using PyMuPDF, but now a case has arisen where there is a ...