Using the approx function for several variables within 2 groups

Question

I have a dataframe 'd' in which sex is 1/2 and age in days is 746 (~24.5 months) or 776 (~25.5 months)

d = structure(list(sex = c(1L, 1L, 2L, 2L), agemos = c(24.5, 25.5, 24.5, 25.5), 
    l = c(-0.216501213, -0.239790488, -0.75220657, -0.78423366), 
    m = c(12.74154396, 12.88102276, 12.13455523, 12.2910249), 
    s = c(0.108166006, 0.108274706, 0.107740345, 0.10847701), 
    agedays = c(746, 776, 746, 776)), row.names = c(NA, -4L), class = "data.frame")

I'd like to interpolate the values of l, m, and s for each day between 746 and 776, and do this separately for each sex. My function for this is

fx = function(d){
   fapp <- function(v)approx(d$agedays,v,xout=746:776)$y
   lapply(d[,c('sex','agedays','l','m','s')],fapp); 
}

I can do this using group_split in dplyr, and then bind the rows:

x <- d %>% group_split(sex) |> lapply(fx); str(x) 
d <- bind_rows(x, .id='sex')

which results in a tibble that looks like

  sex   agedays      l     m     s
   <chr>   <dbl>  <dbl> <dbl> <dbl>
 1 1         746 -0.217  12.7 0.108
 2 1         747 -0.217  12.7 0.108
 3 1         748 -0.218  12.8 0.108
 4 1         749 -0.219  12.8 0.108
 5 1         750 -0.220  12.8 0.108
 6 1         751 -0.220  12.8 0.108
 7 1         752 -0.221  12.8 0.108
 8 1         753 -0.222  12.8 0.108
 9 1         754 -0.223  12.8 0.108
10 1         755 -0.223  12.8 0.108
52 more rows

This is the correct result but I'm concerned because ?group_split indicates that it is 'not stable because you can achieve very similar results by manipulating the nested column returned from tidyr::nest(.by =)'.

I haven't been able to do this with 'nest'— can anyone help? I just realized that I don't need to use 'group_split' because base 'split' works. I'm still interested in 'nest' if anyone can help.

Thanks very much

Axeman · Accepted Answer · 2024-07-09 20:09:13Z

2

fx doesn't need to interpolate sex, and it should probably return a data.frame. Then you can nest, mutate and unnest:

fx = function(d){
  fapp <- function(v) approx(d$agedays,v,xout=746:776)$y
  as.data.frame(lapply(d[,c('agedays','l','m','s')],fapp))
}

d |> 
  nest(.by = sex) |> 
  mutate(outcome = lapply(data, fx)) |> 
  select(-data) |> 
  unnest(outcome)

Perhaps a more natural dplyr way to solve this would be using reframe with across, like so:

reframe(
  d,
  across(c(agedays, l, m, s), \(v) approx(agedays, v, xout = 746:776)$y), 
  .by = sex
)

answered Jul 9 at 20:09

Axeman

33.8k8 gold badges85 silver badges95 bronze badges

Recognized by R Language Collective

The only thing that I'd have changed is to use seq(range(agedays)[1], range(agedays)[2]) instead of hard coding the xout.
– M--
Commented Jul 9 at 21:37

Add a comment |

Collectives™ on Stack Overflow

Using the approx function for several variables within 2 groups

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
r
dplyr
function-approximation
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged rdplyrfunction-approximation or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dplyr
function-approximation
or ask your own question.