Skip to main content
The 2024 Developer Survey results are live! See the results

All Questions

Tagged with
0 votes
1 answer
29 views

KNNImputer drops columns despite of numeric datatypes and right shape

I am using KNNImputer to impute np.nan values in several pd.DataFrame. I checked that all the datatypes of each one of the dataframes are numeric. However, KNNImputer drops some columns in some ...
Ivan's user avatar
  • 121
-4 votes
0 answers
47 views

Getting unreasonable warning while transforming the x_test data using StandardScaler

import numpy as np import pandas as pd df = pd.read_csv('C:/Users/sayed/Downloads/placement.csv') df = df.iloc[:, 1:] X = df.iloc[:, 0:2] Y = df.iloc[:,-1] from sklearn.model_selection import ...
Faisal Sayed's user avatar
0 votes
1 answer
52 views

To apply the optimal model to the test set [closed]

I have a dataset to train and test and another dataset as the test set. I have obtained the optimal model using the training dataset and would like to apply the model to the test set to make the ...
unleasehed's user avatar
1 vote
1 answer
39 views

How to save single Random Forest model with cross validation?

I am using 10 fold cross validation, trying to predict binary labels (Y) based on the embedding inputs (X). I want to save one of the models (perhaps the one with the highest ROC AUC). I'm not sure ...
youtube's user avatar
  • 425
-1 votes
1 answer
62 views

how can i make my sklearn prediction model better?

So basically, I have this model in sklearn that predicts the survival rate of titanic. its accuracy is around 0.77. How can I make it better and more accurate? import pandas as pd import numpy as np ...
Reza Yekta's user avatar
-1 votes
1 answer
354 views

ImportError: cannot import name 'check_pandas_support' from 'sklearn.utils'

These are the versions of packages I have: Python dependencies: sklearn: 1.5.0 pip: 24.0 setuptools: 70.0.0 numpy: 1.23.1 scipy: 1.13.1 Cython: None pandas: 2.2.2 matplotlib: 3.8.2 joblib: 1.4.2 ...
staplegun's user avatar
2 votes
3 answers
89 views

Pandas takes all columns of a dataframe even when some columns are specified

I am trying to train KMeans model using Scikit-Learn. I am stuck on this issue for 2 days. Pandas is selecting all columns of a dataframe even though I specified 2 columns. Here is the dataframe in ...
Shree_ML's user avatar
0 votes
0 answers
31 views

i got ValueError: np.nan is an invalid document, expected byte or unicode string

import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity # Read the first Excel file with Business codes and descriptions ...
mobinhb's user avatar
0 votes
0 answers
33 views

Transformation of original data after PCA

Beginner here. I'm trying to calculate a state's infrastrucure index using different variables and I applied PCA. At first I did a dot product of original data and the principal components. pca = PCA(...
Shriya Desikan's user avatar
1 vote
1 answer
28 views

Label encoder the target in Pipeline

I want to create a pipeline to do preprocessing in both training features and target, then train the model. Dataset would be something like: v1 v2 target 0 1 a yes 1 5 c no 2 3 f ...
Fernando Quintino's user avatar
0 votes
0 answers
38 views

Sklearn preprocessors work sequentially but produce NAs when used in Pipeline

Here's the context: I'm working with a dataset containing various feature types (numerical, categorical). My task is the binary prediction of startup success dependent on a target variable defined ...
Elias Hofmann's user avatar
0 votes
0 answers
33 views

How to use standardscaler() for the predict() function for a single row having multiple columns?

I am trying to build a house price prediction system. Data has outliers and is non-gaussian and for target feature y, log transform is used. I have used StandardScaler() to fit for my model before ...
vizzy bhagat's user avatar
0 votes
0 answers
45 views

Using scikit-learn IsolationForest with multiple columns of a pandas rolling object

my problem is as follow: I have a time series dataframe like so: time value1 value2 1 random value here 2 3 4 5 I want to run IsolationForest for outlier detection on a rolling basis for ...
bachts's user avatar
  • 63
0 votes
1 answer
42 views

Why do my F1/precision/recall outputs equal only 1 in every row?

Need help with finding the reason why for loop outputs only 1s. When I delete the loop, it works just fine, outputting reasonable data, but within the loop, every row of created df is 1. Why is that? ...
tommy's user avatar
  • 1
0 votes
1 answer
40 views

improve my f1_score for classification - pandas/sklearn

I would like advice on how to improve my f1_score for classification. I currently have something around 0.57. Dataset: lotWaferDie - lot, board and chip on which defects were measured string values ...
Aaron7's user avatar
  • 279

15 30 50 per page
1
2 3 4 5
229