All Questions
Tagged with pandas scikit-learn
3,431
questions
0
votes
1
answer
29
views
KNNImputer drops columns despite of numeric datatypes and right shape
I am using KNNImputer to impute np.nan values in several pd.DataFrame. I checked that all the datatypes of each one of the dataframes are numeric. However, KNNImputer drops some columns in some ...
-4
votes
0
answers
47
views
Getting unreasonable warning while transforming the x_test data using StandardScaler
import numpy as np
import pandas as pd
df = pd.read_csv('C:/Users/sayed/Downloads/placement.csv')
df = df.iloc[:, 1:]
X = df.iloc[:, 0:2]
Y = df.iloc[:,-1]
from sklearn.model_selection import ...
0
votes
1
answer
52
views
To apply the optimal model to the test set [closed]
I have a dataset to train and test and another dataset as the test set.
I have obtained the optimal model using the training dataset and would like to apply the model to the test set to make the ...
1
vote
1
answer
39
views
How to save single Random Forest model with cross validation?
I am using 10 fold cross validation, trying to predict binary labels (Y) based on the embedding inputs (X).
I want to save one of the models (perhaps the one with the highest ROC AUC). I'm not sure ...
-1
votes
1
answer
62
views
how can i make my sklearn prediction model better?
So basically, I have this model in sklearn that predicts the survival rate of titanic. its accuracy is around 0.77.
How can I make it better and more accurate?
import pandas as pd
import numpy as np
...
-1
votes
1
answer
354
views
ImportError: cannot import name 'check_pandas_support' from 'sklearn.utils'
These are the versions of packages I have:
Python dependencies:
sklearn: 1.5.0
pip: 24.0
setuptools: 70.0.0
numpy: 1.23.1
scipy: 1.13.1
Cython: None
pandas: 2.2.2
matplotlib: 3.8.2
joblib: 1.4.2
...
2
votes
3
answers
89
views
Pandas takes all columns of a dataframe even when some columns are specified
I am trying to train KMeans model using Scikit-Learn.
I am stuck on this issue for 2 days.
Pandas is selecting all columns of a dataframe even though I specified 2 columns.
Here is the dataframe in ...
0
votes
0
answers
31
views
i got ValueError: np.nan is an invalid document, expected byte or unicode string
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Read the first Excel file with Business codes and descriptions
...
0
votes
0
answers
33
views
Transformation of original data after PCA
Beginner here. I'm trying to calculate a state's infrastrucure index using different variables and I applied PCA.
At first I did a dot product of original data and the principal components.
pca = PCA(...
1
vote
1
answer
28
views
Label encoder the target in Pipeline
I want to create a pipeline to do preprocessing in both training features and target, then train the model. Dataset would be something like:
v1 v2 target
0 1 a yes
1 5 c no
2 3 f ...
0
votes
0
answers
38
views
Sklearn preprocessors work sequentially but produce NAs when used in Pipeline
Here's the context:
I'm working with a dataset containing various feature types (numerical, categorical).
My task is the binary prediction of startup success dependent on a target variable defined ...
0
votes
0
answers
33
views
How to use standardscaler() for the predict() function for a single row having multiple columns?
I am trying to build a house price prediction system. Data has outliers and is non-gaussian and for target feature y, log transform is used. I have used StandardScaler() to fit for my model before ...
0
votes
0
answers
45
views
Using scikit-learn IsolationForest with multiple columns of a pandas rolling object
my problem is as follow: I have a time series dataframe like so:
time value1 value2
1 random value here
2
3
4
5
I want to run IsolationForest for outlier detection on a rolling basis for ...
0
votes
1
answer
42
views
Why do my F1/precision/recall outputs equal only 1 in every row?
Need help with finding the reason why for loop outputs only 1s. When I delete the loop, it works just fine, outputting reasonable data, but within the loop, every row of created df is 1. Why is that?
...
0
votes
1
answer
40
views
improve my f1_score for classification - pandas/sklearn
I would like advice on how to improve my f1_score for classification. I currently have something around 0.57. Dataset:
lotWaferDie - lot, board and chip on which defects were measured
string values ...