Newest 'pymupdf' Questions

0 votes

0 answers

21 views

Can't get form elements to align properly using pymupdf and reportlab

I'm writing a program that takes form files and turns them into interactive PDF forms by using markers to locate and draw form elements. I've gotten my radio buttons to align on the page properly but ...

BurgundyE30

1

asked 18 hours ago

0 votes

0 answers

16 views

DocumentAI OCR Error: Invalid Document Content

I am calling DocumentAI OCR batch processing from Workflows generally quite successfully, however, I occasionally get the following error: { "caughtError": { "message": "...

Leo Glowacki

101

asked Jul 23 at 19:47

0 votes

0 answers

10 views

Match text extracted with unstructured to parsing from PyMuPDF

I am parsing PDFs with unstructured for use in an application. Later i would like to highlight some of these extracted text passaged in the actual pdf. I have seen that pymupdf has the capabilities to ...

J.N.

301

asked Jul 19 at 16:00

0 votes

1 answer

32 views

PyMuPDF page.search_for(text) splits a line break into two completely different objects

I use PyMuPDF for document redaction. Sometimes I want to redact only one instance of a certain text on a page, so I use an index: areas = page.search_for(text) area = areas[index] The problem is ...

eliotz

119

asked Jul 16 at 9:53

0 votes

1 answer

63 views

I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?

I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it? import fitz doc = fitz.open("2303.11366v4.pdf")# download from ...

xxy

23

asked Jul 14 at 16:28

0 votes

0 answers

48 views

How to resolve cx_Freeze error about pdf2docx mupdf import?

I've never posted here before, but this time I'm genuinely stuck. I'm using cx_Freeze on a python script that uses pdf2docx. Running the python script normally works fine, but running cx_Freeze's ...

Ploso

1

asked Jul 9 at 16:38

-1 votes

1 answer

40 views

Extracting Text from PDFs with Python Without Including Comments

I have been trying to extract text from PDF files to automate a significant and tedious part of my job using Python. With the help of ChatGPT, I have written multiple lines of code. However, I am ...

MDMT

1

asked Jul 8 at 12:42

1 vote

2 answers

68 views

Handling malformed PDF with MuPDF

I am using MuPDF to read some PDF documents, and recently I have started getting some malformed PDFs coming from Google GSuite where the hex header in the beginning has some additional bytes added in ...

Travis Lu

57

asked Jul 5 at 18:52

0 votes

0 answers

79 views

How do I replace embedded fonts in PDF

I have a use case where I extract embedded fonts from PDF to modify them to add unicode mapping and I want to put them back. I tried several approaches and everything fails. Last attempt was using ...

Alexander Weps

1

asked Jun 28 at 15:38

0 votes

0 answers

24 views

Delete all footnote/endnote numbers throughout a pdf using Pymupdf?

I am trying to delete all footnotes/endnotes in a pdf. They all have a font size of six, while the rest of the text has a font size of twelve. Example text: "The Industrial Revolution marked a ...

HTMLHelpMe

311

asked Jun 28 at 8:22

0 votes

0 answers

59 views

Why is the zero point of the coordinate system (0, 0) for the pymupdf library sometimes start from the upper right corner?

Using the doctr library, I recognize text on a PDF file. From the entire text, I select keywords and the coordinates of these words. I receive the coordinates in the following format: ...

Paul

125

asked Jun 26 at 7:24

0 votes

0 answers

156 views

Compare two pdfs for the same figure

The essence of my problem: I have a lot of incoming PDFs, but for work I don't need all of them, only PDFs from a certain supplier. I defined keywords and coordinates of keywords that only this ...

Paul

125

asked Jun 24 at 11:35

0 votes

0 answers

41 views

paste pdf image with correct width and height and position to svg file in python using fitz

`i am trying to convert pdf file to svg with correct formating i am using the fitz pymupdf library the text is formatted correctly but i cant adjust the image "this is my first time working with ...

ahmad tayyab

1

asked Jun 24 at 8:45

0 votes

1 answer

52 views

Overlaying two PDFs with an alpha mask at 50%

I'm trying to recovers notes I took on an iPad over a PDF, that I saved as a new PDF before the application crashed. This new PDF is corrupted, but I could repair it so that it contains all my notes (...

Hadriensz

11

asked Jun 17 at 7:51

0 votes

2 answers

36 views

How to differentiate between repeated fields in a pdf using PyMuPDF

I have a pdf where there are some fields like this: ( ) I do not have tax residency outside Brazil. Today I can fill this type with an X using PyMuPDF, but now a case has arisen where there is a ...

Fábio Mattes

1

asked Jun 13 at 14:40

Collectives™ on Stack Overflow

Questions tagged [pymupdf]

Can't get form elements to align properly using pymupdf and reportlab

DocumentAI OCR Error: Invalid Document Content

Match text extracted with unstructured to parsing from PyMuPDF

PyMuPDF page.search_for(text) splits a line break into two completely different objects

I read a PDF file using Python, and part of the content is displayed as a string of garbled text. How should I restore it?

How to resolve cx_Freeze error about pdf2docx mupdf import?

Extracting Text from PDFs with Python Without Including Comments

Handling malformed PDF with MuPDF

How do I replace embedded fonts in PDF

Delete all footnote/endnote numbers throughout a pdf using Pymupdf?

Why is the zero point of the coordinate system (0, 0) for the pymupdf library sometimes start from the upper right corner?

Compare two pdfs for the same figure

paste pdf image with correct width and height and position to svg file in python using fitz

Overlaying two PDFs with an alpha mask at 50%

How to differentiate between repeated fields in a pdf using PyMuPDF

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [pymupdf]

Related Tags