All Questions
Tagged with r data.table
13,715
questions
1
vote
1
answer
63
views
How can I append duplicated groups of a dataset with changes to existing data in R efficiently?
I have data like this:
key_data <- data.frame(orig_letter=c("A","A","A","A","C","C","F","B","B","B&...
0
votes
2
answers
36
views
Summarize multiple columns using colSums2 and lapply
I'm trying to summarize some columns using the matrixStats package and lapply inside data.table
library(data.table)
library(matrixStats)
set.seed(100)
n = 100
dd = data.table(row_id = 1:n, person=rep(...
0
votes
1
answer
46
views
sumifs with a criteria
Here is my data.
iris.dt<-as.data.table(iris);
And this is exactly what I want, and it works fine.
iris.dt[,Sum.Petal.Width := c(sum(Petal.Width),rep(0, .N - 1)), by = Species];
Output to verify:
...
0
votes
0
answers
24
views
mean of all or a subset of columns when subsetting by group using 'j' term in data.table [duplicate]
I have a large data.table where the columns represent many variables (v1... v99) and the ID for each set of variables. I would like to know the mean of each variable for each individual ID.
If this ...
0
votes
0
answers
37
views
Why does chaining and assignment require you to enter the variable twice in console before it appears? [duplicate]
I have an example below that results in nothing appearing in the last line when it is run in my RStudio console. However, running it again the second time shows what final_counts is. Why does this ...
0
votes
2
answers
32
views
How to Calculate Average Time from First Activity to a Milestone in a User Activity Log Using data.table in R?
I'm working with a dataset of user activity logs and need to calculate the average time it takes for first-time users to reach a specific milestone. Specifically, I want to find the average time it ...
0
votes
1
answer
23
views
How to create a variable based on unique counts within a time interval by multiple time points and grouping variable?
I would like to count the unique number of drugs, defined as the number of unique drug_code dispensations each individual (noted by idnr) have within 1 year prior the index_date + time_from_index. The ...
0
votes
0
answers
57
views
fread() takes 60GB of RAM to load a 22GB CSV dataset [duplicate]
I am loading a CSV file into RStudio using fread() and despite the file being 22GB large, I can see my memory usage at 60 of my 64GB. Why is that? This becomes a problem right after as I need to join ...
2
votes
4
answers
112
views
data.table vs dplyr: apply function returning changing column names over groups
I want to apply a function (ratefunc()) to a grouped data frame which returns changing column names dependent on the result:
library(data.table)
library(dplyr)
dt <- data.table:::data.table(
...
0
votes
2
answers
60
views
calculating count and sum with a condition and on multiple category
This is my data.
irisData<-as.data.table(iris)
bins<-seq(4, length.out = 9, by = 0.5);
aggtable <- data.table(bin1 = bins[-length(bins)], bin2 = bins[-1])
I would like to create a count and ...
1
vote
2
answers
23
views
How to select specific columns across multiple dataframes in R and then bind them into one data.frame?
I am trying to select or subset multiple data frames with different number of columns. They all contain the same columns of interest, so I am trying to make them all contain the same columns so I can ...
0
votes
0
answers
36
views
Sum of previous years observations for unstructured data in R [closed]
I have very unstructured data for following variables:
Host Home Industry Value Year value_lag
A X I 1 2001 NA
B X I 2 2001 NA
C X I 3 2003 NA
A X I ...
1
vote
2
answers
44
views
Recode relationship matrices based on new subgrouping
Problem:
I have a survey dataset which includes intra-household relationships. I had to subdivide household into tax-unit, which means I need to redefine the relationship matrices based on the new tax-...
1
vote
3
answers
89
views
How to calculate conditional counts and sums?
I want to do something similar of COUNTIFS and SUMIFS in R using data.table package.
Here is my data.
library(data.table)
treesData<-as.data.table(trees)
bins<-seq(63, length.out = 10, by = 3);
...
3
votes
2
answers
90
views
Improve processing time of applying a function over a vector and grouping by columns
I am trying to apply a function over data.table columns, and grouping by columns value.
I am using the lapply fuction, but my script is quite slow.
To give some context, I am working of probability ...