Questions tagged [imbalanced-data]
The imbalanced-data tag has no usage guidance, but it has a tag wiki.
imbalanced-data
352
questions
-2
votes
0
answers
10
views
solving the imbalanced numbers of DICOM Data in medical MRI datasets
there is a dataset of brain MRI in DICOM format and we want to use them for training a model every ID have 3 folders and folders contain files between 16 to 20 and now my problem is how can i balance ...
0
votes
0
answers
12
views
ROSE package in R not reading variable correctly; does not read updated value contained in variable
I'm hoping to receive some help here as I've struggled for a while now and I cannot figure out the problem.
I am using the ROSE package in R, attempting to make use of the function for random over/...
0
votes
0
answers
5
views
Imbalanced Dataset Correlation in Machine Learning
If there is an imbalanced dataset, I cannot figure out the correlation or dependency of the target column on different features. How can I check that?
I am using countplot but with that, I cannot ...
0
votes
0
answers
167
views
Managing problems of class imbalance in machine learning models using spatial data in R
I am trying to simultaneously perform feature selection and hyperparameter tuning on stacked learners (glmnet and rpart). However, I am encountering the following error message with the classif.glmnet ...
0
votes
0
answers
12
views
I faced an error when I used PCA with LSTM model
I have a time series dataset with 20 classes, but they are imbalanced; when I tried a method like "RandomOverSampler", I got an error because of the 3D of our data so could you suggest a ...
0
votes
0
answers
15
views
Weighted F1-score
I'm training and validating models for a binary classification problem in a dataset that has great class imbalance.
When searching for metrics for evaluating the performance of the models, I found ...
0
votes
0
answers
22
views
Class imbalance calculation for each class in a dataset
I am trying to compute class imbalance in each dataset and my approach was to check average and standard deviation of the counts. The average is the total number of samples in class 1 / total number ...
1
vote
2
answers
56
views
Does XGBoost's scale_pos_weight correctly balance the positive samples if the training dataset has more positive than negative samples?
After researching, I realized that scale_pos_weight is typically calculated as the ratio of the number of negative samples to the number of positive samples in the training data. My dataset has 840 ...
0
votes
1
answer
52
views
Which parts of the Imbalanced Learn Pipeline are applied to the test set?
I have created an imbalanced-learn Pipeline consisting of RobustScaler, SMOTE-NC, RandomUndersampling and a Random Forest Classifier.
A RandomSearchCV is used to select the best hyperparameters.
I ...
1
vote
1
answer
57
views
Class_weight parameter not impacting results in imbalanced dataset with RandomForestClassifier
I'm fairly new to ML and now I'm in the process of predicting employee attrition in a medium sized dataset. I have been able to run everything smoothly, but, as the dataset is imbalanced, I've been ...
0
votes
0
answers
16
views
Working with classWeight in model parameters for highly imbalanced datasets in pyspark
I am working on a binary classification problem with a highly balanced dataset(majority class 0: 523152826, and minority class 1: 2711142)
I tried the logistic regression model from pyspark.ml....
0
votes
0
answers
15
views
using class_weight in model.fit() doesnt't work
I have an imbalanced dataset and I would like to use class_weight in model.fit().
When I use model.fit() without class_weight, it works correctly, but if I add class_weight, I've got an error.
My ...
0
votes
0
answers
43
views
How do I add a bias to the last layer in my model if my model outputs logits and not probabilities?
I'm working on a medical image binary segmentation problem using a U-Net in tensorflow, and my classes are extremely unbalanced (about 1 in 10,000). As a result, my model wastes a ton of time going ...
0
votes
0
answers
13
views
Use an external system-installed Scala library in Python in Databricks notebook
In the context of fixing an imbalanced dataset in pyspark, I found the following external library in scala which is similar to SMOTE for imbalanced data:
I installed it on my system with > $...
0
votes
0
answers
22
views
Highly imbalanced pyspark dataset
I have a highly imbalanced Pyspark dataset (523148956 for majority class vs 2722245 for minority class) and I would like to perform techniques to balance it without having to convert it to pandas.
Can ...