Newest 'pdfminer' Questions

-4 votes

0 answers

55 views

To make a Letterhead consisting of Company Name, Logo, Address using Python , PDFminer, Regex , Canvas, ReportLab and any other tools

Create a letterhead in pdf format,using python in which we can use pdfminer,extract text ,regex, and any other tools as such.That letterhead should contain name, logo of that company and company ...

Craka

43

asked Jul 10 at 7:06

0 votes

0 answers

25 views

Edit a pdf - remove a line from the pdf if it contains an "X" character

I'd like to mass-edit pdf files, and more specifically to remove the line containing the "X" character from all my pdf files, but I can't do it. my document has 3 parts: a header with ...

fatiha moscatiello

1

asked Jul 5 at 9:07

0 votes

0 answers

49 views

Python PDF page size

I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...

calwex718

87

asked Jul 1 at 18:07

0 votes

0 answers

34 views

How can I add a string on pdf file with shipping label layout using python?

I'm trying to add a string on a shipping label which is under pdf. I'm not 100% sure if using python to do this task but as a noob that just got into coding world, it's the only language that I can ...

Dongwon Kang

1

asked Jul 1 at 14:15

0 votes

0 answers

53 views

Extracting screenshots from an Exam Paper for questions and their parts

I want to extract screenshots of questions from a pdf exam paper. I wanted the question and its parts to be separated, so the actual question's introduction would be in a different screen shot and the ...

Mario_Dev

1

asked Jun 18 at 6:48

0 votes

1 answer

46 views

Text extracted from Pdfminer python library is empty but the length of text is not 0

I have a python code in which I am trying to read the contents of various pdf files-scanned and text based both using pdfminer , the code is like this: ``with open(os.path.join(pdf_directory, ...

dsnoob27

71

asked Jun 5 at 5:13

-2 votes

1 answer

62 views

issue when installing pdfminer.six in python 3.2.12 on windows 10

I want to install pdfminer.six to extract text from pdf files. I cannot do it with python 3.12.2 This is the output of the command pip install pdfminer.six PS C:\Users\Admin\resume_ai> pip install ...

ben amor Lamia

1

asked May 17 at 9:11

0 votes

0 answers

35 views

Python pdfminer LAParams not able to extract bulletpoints as paras

I have a pdf file and i wanna parse text from it with pdfminer.The problem is LAParams is not able to extract bullet points as line.I can't figure out why. My pdf looks like this: pdf Out put looks ...

Pranav

11

asked May 1 at 20:01

0 votes

0 answers

42 views

Regular Expression extracting strings between matches

I am trying to assign the text of a PDF to columns and using a regular expression to get the values in between matches in column headers. Ultimately for CSV. I am getting the Year data printed twice, ...

Brooke

19

asked Apr 30 at 1:46

1 vote

0 answers

40 views

Is there a way to obtain the coordinates of the position of the character and not of the bbox?

I have this code that what I do is take the characteristics of the characters from the PDF and copy them to another PDF but the problem is that the and is not the same position since I don't know why ...

David Gazpio

11

asked Apr 2 at 13:11

0 votes

0 answers

26 views

have made this code with pdfminer to access the structure of a PDF. In this case I only access the LTchar structure (characters)

from pdfminer.high_level import extract_pages from pdfminer.layout import LTContainer, LTTextContainer, LTChar def mostrar_estructura(pagina): def buscar_ltchar(elemento): if isinstance(...

David Gazpio

11

asked Mar 26 at 15:48

0 votes

0 answers

27 views

extract char location from pdf gives the wrong y corrdinates

I use the pdfminer to get the location of chars from a pdf, but it seems to give the wrong y coordinates when I convert the pdf to image the all the pixels are in the wrong place I understand the ...

Elia Weiss

8,992

asked Feb 6 at 9:45

0 votes

0 answers

31 views

Is it possible for me separate a pdf with pdfminer based on straight horizontal lines?

I have a pdf with muliple tables where I haven't been successful with tabula for extracting its tables. I noticed every table has a top straight horizontalline and a bottom horizontal line too, do I'd ...

Volts_to_cables

1

asked Jan 24 at 17:02

0 votes

0 answers

57 views

pdf miner adding extra new lines

While employing PDFMinerLoader to parse PDF files, I've observed that it introduces additional new lines when encountering bullets or numbers. For example: Original pdf: use the... replace the.. ...

lali

1

asked Dec 24, 2023 at 6:19

0 votes

1 answer

123 views

PDF to CSV - converted CSV has interchanged column Contents

I am trying to convert a PDF file into CSV using python and written below code for the same. Earlier it was working however recently its not working. I am getting interchanged column contents in the ...

linux01

73

asked Oct 23, 2023 at 10:03

Collectives™ on Stack Overflow

Questions tagged [pdfminer]

To make a Letterhead consisting of Company Name, Logo, Address using Python , PDFminer, Regex , Canvas, ReportLab and any other tools

Edit a pdf - remove a line from the pdf if it contains an "X" character

Python PDF page size

How can I add a string on pdf file with shipping label layout using python?

Extracting screenshots from an Exam Paper for questions and their parts

Text extracted from Pdfminer python library is empty but the length of text is not 0

issue when installing pdfminer.six in python 3.2.12 on windows 10

Python pdfminer LAParams not able to extract bulletpoints as paras

Regular Expression extracting strings between matches

Is there a way to obtain the coordinates of the position of the character and not of the bbox?

have made this code with pdfminer to access the structure of a PDF. In this case I only access the LTchar structure (characters)

extract char location from pdf gives the wrong y corrdinates

Is it possible for me separate a pdf with pdfminer based on straight horizontal lines?

pdf miner adding extra new lines

PDF to CSV - converted CSV has interchanged column Contents

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [pdfminer]

Related Tags