Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [pdfminer]

A python-based tool for extracting information from PDF documents.

pdfminer
-4 votes
0 answers
55 views

To make a Letterhead consisting of Company Name, Logo, Address using Python , PDFminer, Regex , Canvas, ReportLab and any other tools

Create a letterhead in pdf format,using python in which we can use pdfminer,extract text ,regex, and any other tools as such.That letterhead should contain name, logo of that company and company ...
Craka's user avatar
  • 43
0 votes
0 answers
25 views

Edit a pdf - remove a line from the pdf if it contains an "X" character

I'd like to mass-edit pdf files, and more specifically to remove the line containing the "X" character from all my pdf files, but I can't do it. my document has 3 parts: a header with ...
fatiha moscatiello's user avatar
0 votes
0 answers
49 views

Python PDF page size

I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...
calwex718's user avatar
0 votes
0 answers
34 views

How can I add a string on pdf file with shipping label layout using python?

I'm trying to add a string on a shipping label which is under pdf. I'm not 100% sure if using python to do this task but as a noob that just got into coding world, it's the only language that I can ...
Dongwon Kang's user avatar
0 votes
0 answers
53 views

Extracting screenshots from an Exam Paper for questions and their parts

I want to extract screenshots of questions from a pdf exam paper. I wanted the question and its parts to be separated, so the actual question's introduction would be in a different screen shot and the ...
Mario_Dev's user avatar
0 votes
1 answer
46 views

Text extracted from Pdfminer python library is empty but the length of text is not 0

I have a python code in which I am trying to read the contents of various pdf files-scanned and text based both using pdfminer , the code is like this: ``with open(os.path.join(pdf_directory, ...
dsnoob27's user avatar
-2 votes
1 answer
62 views

issue when installing pdfminer.six in python 3.2.12 on windows 10

I want to install pdfminer.six to extract text from pdf files. I cannot do it with python 3.12.2 This is the output of the command pip install pdfminer.six PS C:\Users\Admin\resume_ai> pip install ...
ben amor Lamia's user avatar
0 votes
0 answers
35 views

Python pdfminer LAParams not able to extract bulletpoints as paras

I have a pdf file and i wanna parse text from it with pdfminer.The problem is LAParams is not able to extract bullet points as line.I can't figure out why. My pdf looks like this: pdf Out put looks ...
Pranav's user avatar
  • 11
0 votes
0 answers
42 views

Regular Expression extracting strings between matches

I am trying to assign the text of a PDF to columns and using a regular expression to get the values in between matches in column headers. Ultimately for CSV. I am getting the Year data printed twice, ...
Brooke's user avatar
  • 19
1 vote
0 answers
40 views

Is there a way to obtain the coordinates of the position of the character and not of the bbox?

I have this code that what I do is take the characteristics of the characters from the PDF and copy them to another PDF but the problem is that the and is not the same position since I don't know why ...
David Gazpio's user avatar
0 votes
0 answers
26 views

have made this code with pdfminer to access the structure of a PDF. In this case I only access the LTchar structure (characters)

from pdfminer.high_level import extract_pages from pdfminer.layout import LTContainer, LTTextContainer, LTChar def mostrar_estructura(pagina): def buscar_ltchar(elemento): if isinstance(...
David Gazpio's user avatar
0 votes
0 answers
27 views

extract char location from pdf gives the wrong y corrdinates

I use the pdfminer to get the location of chars from a pdf, but it seems to give the wrong y coordinates when I convert the pdf to image the all the pixels are in the wrong place I understand the ...
Elia Weiss's user avatar
  • 8,992
0 votes
0 answers
31 views

Is it possible for me separate a pdf with pdfminer based on straight horizontal lines?

I have a pdf with muliple tables where I haven't been successful with tabula for extracting its tables. I noticed every table has a top straight horizontalline and a bottom horizontal line too, do I'd ...
Volts_to_cables's user avatar
0 votes
0 answers
57 views

pdf miner adding extra new lines

While employing PDFMinerLoader to parse PDF files, I've observed that it introduces additional new lines when encountering bullets or numbers. For example: Original pdf: use the... replace the.. ...
lali's user avatar
  • 1
0 votes
1 answer
123 views

PDF to CSV - converted CSV has interchanged column Contents

I am trying to convert a PDF file into CSV using python and written below code for the same. Earlier it was working however recently its not working. I am getting interchanged column contents in the ...
linux01's user avatar
  • 73

15 30 50 per page
1
2 3 4 5
33