Questions tagged [pdfminer]
A python-based tool for extracting information from PDF documents.
pdfminer
495
questions
-4
votes
0
answers
55
views
To make a Letterhead consisting of Company Name, Logo, Address using Python , PDFminer, Regex , Canvas, ReportLab and any other tools
Create a letterhead in pdf format,using python in which we can use pdfminer,extract text ,regex, and any other tools as such.That letterhead should contain name, logo of that company and company ...
0
votes
0
answers
25
views
Edit a pdf - remove a line from the pdf if it contains an "X" character
I'd like to mass-edit pdf files, and more specifically to remove the line containing the "X" character from all my pdf files, but I can't do it.
my document has 3 parts: a header with ...
0
votes
0
answers
49
views
Python PDF page size
I am trying to get the page sizes of the pages in my PDF. I have tried using both PyPDF2 and pdfminer, I get the same results from both - 423.024x639.024 for artbox, cropbox, etc, and 459.048x675.048 ...
0
votes
0
answers
34
views
How can I add a string on pdf file with shipping label layout using python?
I'm trying to add a string on a shipping label which is under pdf. I'm not 100% sure if using python to do this task but as a noob that just got into coding world, it's the only language that I can ...
0
votes
0
answers
53
views
Extracting screenshots from an Exam Paper for questions and their parts
I want to extract screenshots of questions from a pdf exam paper. I wanted the question and its parts to be separated, so the actual question's introduction would be in a different screen shot and the ...
0
votes
1
answer
46
views
Text extracted from Pdfminer python library is empty but the length of text is not 0
I have a python code in which I am trying to read the contents of various pdf files-scanned and text based both using pdfminer , the code is like this:
``with open(os.path.join(pdf_directory, ...
-2
votes
1
answer
62
views
issue when installing pdfminer.six in python 3.2.12 on windows 10
I want to install pdfminer.six to extract text from pdf files.
I cannot do it with python 3.12.2
This is the output of the command pip install pdfminer.six
PS C:\Users\Admin\resume_ai> pip install ...
0
votes
0
answers
35
views
Python pdfminer LAParams not able to extract bulletpoints as paras
I have a pdf file and i wanna parse text from it with pdfminer.The problem is LAParams is not able to extract bullet points as line.I can't figure out why.
My pdf looks like this:
pdf Out put looks ...
0
votes
0
answers
42
views
Regular Expression extracting strings between matches
I am trying to assign the text of a PDF to columns and using a regular expression to get the values in between matches in column headers. Ultimately for CSV.
I am getting the Year data printed twice, ...
1
vote
0
answers
40
views
Is there a way to obtain the coordinates of the position of the character and not of the bbox?
I have this code that what I do is take the characteristics of the characters from the PDF and copy them to another PDF but the problem is that the and is not the same position since I don't know why ...
0
votes
0
answers
26
views
have made this code with pdfminer to access the structure of a PDF. In this case I only access the LTchar structure (characters)
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTContainer, LTTextContainer, LTChar
def mostrar_estructura(pagina):
def buscar_ltchar(elemento):
if isinstance(...
0
votes
0
answers
27
views
extract char location from pdf gives the wrong y corrdinates
I use the pdfminer to get the location of chars from a pdf, but it seems to give the wrong y coordinates
when I convert the pdf to image the all the pixels are in the wrong place
I understand the ...
0
votes
0
answers
31
views
Is it possible for me separate a pdf with pdfminer based on straight horizontal lines?
I have a pdf with muliple tables where I haven't been successful with tabula for extracting its tables.
I noticed every table has a top straight horizontalline and a bottom horizontal line too, do I'd ...
0
votes
0
answers
57
views
pdf miner adding extra new lines
While employing PDFMinerLoader to parse PDF files, I've observed that it introduces additional new lines when encountering bullets or numbers. For example:
Original pdf:
use the...
replace the..
...
0
votes
1
answer
123
views
PDF to CSV - converted CSV has interchanged column Contents
I am trying to convert a PDF file into CSV using python and written below code for the same. Earlier it was working however recently its not working. I am getting interchanged column contents in the ...