Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

0 votes
0 answers
21 views

Can't get form elements to align properly using pymupdf and reportlab

I'm writing a program that takes form files and turns them into interactive PDF forms by using markers to locate and draw form elements. I've gotten my radio buttons to align on the page properly but ...
 BurgundyE30's user avatar
0 votes
0 answers
16 views

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...
Leo Glowacki's user avatar
0 votes
0 answers
10 views

Match text extracted with unstructured to parsing from PyMuPDF

I am parsing PDFs with unstructured for use in an application. Later i would like to highlight some of these extracted text passaged in the actual pdf. I have seen that pymupdf has the capabilities to ...
J.N.'s user avatar
  • 301
0 votes
1 answer
32 views

PyMuPDF page.search_for(text) splits a line break into two completely different objects

I use PyMuPDF for document redaction. Sometimes I want to redact only one instance of a certain text on a page, so I use an index: areas = page.search_for(text) area = areas[index] The problem is ...
eliotz's user avatar
  • 119
0 votes
1 answer
63 views

I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?

I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it? import fitz doc = fitz.open("2303.11366v4.pdf")# download from ...
xxy's user avatar
  • 23
0 votes
0 answers
48 views

How to resolve cx_Freeze error about pdf2docx mupdf import?

I've never posted here before, but this time I'm genuinely stuck. I'm using cx_Freeze on a python script that uses pdf2docx. Running the python script normally works fine, but running cx_Freeze's ...
Ploso's user avatar
  • 1
-1 votes
1 answer
40 views

Extracting Text from PDFs with Python Without Including Comments

I have been trying to extract text from PDF files to automate a significant and tedious part of my job using Python. With the help of ChatGPT, I have written multiple lines of code. However, I am ...
MDMT's user avatar
  • 1
1 vote
2 answers
68 views

Handling malformed PDF with MuPDF

I am using MuPDF to read some PDF documents, and recently I have started getting some malformed PDFs coming from Google GSuite where the hex header in the beginning has some additional bytes added in ...
Travis Lu's user avatar
0 votes
0 answers
79 views

How do I replace embedded fonts in PDF

I have a use case where I extract embedded fonts from PDF to modify them to add unicode mapping and I want to put them back. I tried several approaches and everything fails. Last attempt was using ...
Alexander Weps's user avatar
0 votes
0 answers
24 views

Delete all footnote/endnote numbers throughout a pdf using Pymupdf?

I am trying to delete all footnotes/endnotes in a pdf. They all have a font size of six, while the rest of the text has a font size of twelve. Example text: "The Industrial Revolution marked a ...
HTMLHelpMe's user avatar
0 votes
0 answers
59 views

Why is the zero point of the coordinate system (0, 0) for the pymupdf library sometimes start from the upper right corner?

Using the doctr library, I recognize text on a PDF file. From the entire text, I select keywords and the coordinates of these words. I receive the coordinates in the following format: ...
Paul's user avatar
  • 125
0 votes
0 answers
156 views

Compare two pdfs for the same figure

The essence of my problem: I have a lot of incoming PDFs, but for work I don't need all of them, only PDFs from a certain supplier. I defined keywords and coordinates of keywords that only this ...
Paul's user avatar
  • 125
0 votes
0 answers
41 views

paste pdf image with correct width and height and position to svg file in python using fitz

`i am trying to convert pdf file to svg with correct formating i am using the fitz pymupdf library the text is formatted correctly but i cant adjust the image "this is my first time working with ...
ahmad tayyab's user avatar
0 votes
1 answer
52 views

Overlaying two PDFs with an alpha mask at 50%

I'm trying to recovers notes I took on an iPad over a PDF, that I saved as a new PDF before the application crashed. This new PDF is corrupted, but I could repair it so that it contains all my notes (...
Hadriensz's user avatar
0 votes
2 answers
36 views

How to differentiate between repeated fields in a pdf using PyMuPDF

I have a pdf where there are some fields like this: ( ) I do not have tax residency outside Brazil. Today I can fill this type with an X using PyMuPDF, but now a case has arisen where there is a ...
Fábio Mattes's user avatar

15 30 50 per page
1
2 3 4 5
24