dplyr identify rows that do not have a matched column within group

Question

I have a df that looks as follow:

GROUP    NAME     TITLE     LEVEL
  A      John      Lead       0
  A      John      Staff      1
  A      Jake      Jr         0
  B      Bob       Lead       1
  B      Bob       Lead       0
  C      Andrew    Jr         0
  C      Andrew    Jr         1
  C      Rebecca   Staff      0

What I am trying to do is identify names within each group that have a title in level 1 that they do not have for level 0. In this example I would be looking for:

GROUP    NAME     TITLE     LEVEL
  A      John      Staff      1

Because in group A John has a title of staff for level 1 but not for level 0.

Jon Spring · Accepted Answer · 2024-07-09 16:48:02Z

2

The simplest approach I can think of would be an anti-join, where we look for GROUP/NAME/TITLE in Level 1 that is not in Level 0.

anti_join(df |> filter(LEVEL == 1),
          df |> filter(LEVEL == 0),
          join_by(GROUP, NAME, TITLE))

Result

# A tibble: 1 × 4
  GROUP NAME  TITLE LEVEL
  <chr> <chr> <chr> <dbl>
1 A     John  Staff     1

answered Jul 9 at 16:48

Jon Spring

62.7k4 gold badges38 silver badges61 bronze badges

Add a comment |

Maël · Accepted Answer · 2024-07-09 16:24:18Z

1

Within each name, there should be:

At least one observation for each levels of LEVEL (0 and 1).
At least one (length(...) != 0) TITLE among the LEVEL == 1 that is different (setdiff) from any of the TITLEs with LEVEL == 0.

df |> 
  filter(any(LEVEL == 0) & LEVEL == 1 &
         length(setdiff(TITLE[LEVEL == 1], TITLE[LEVEL == 0])) != 0,
         .by = NAME)

#   GROUP NAME TITLE LEVEL
# 1     A John Staff     1

edited Jul 9 at 16:24

answered Jul 9 at 16:18

Maël

50.9k4 gold badges40 silver badges76 bronze badges

Recognized by R Language Collective

Add a comment |

Nir Graham · Accepted Answer · 2024-07-09 16:23:40Z

I did it like so :

library(tidyverse)
step_1 <- read_csv("GROUP,NAME, TITLE, LEVEL
A,John,Lead, 0
A,John,Staff,1
A,Jake,Jr, 0
B,Bob, Lead, 1
B,Bob, Lead, 0
C,Andrew,Jr, 0
C,Andrew,Jr, 1
C,Rebecca, Staff,0")

# identify names within each group that have a title in level 1 that they do not have for level 0

step_1 |> split(~GROUP) |> map(\(grp){
grp |> pivot_wider(names_from="LEVEL",
                   values_from="TITLE") |>
    filter(`0`!=`1`)
}) |> list_rbind()

#  A tibble: 1 × 4
#  GROUP NAME  `0`   `1`  
#  <chr> <chr> <chr> <chr>
1 A     John  Lead  Staff

Onyambu · Accepted Answer · 2024-07-10 02:31:16Z

0

use n_distinct:

df %>% 
  filter(n_distinct(LEVEL) == 1 & LEVEL == 1, .by =c(GROUP, NAME, TITLE))

  GROUP NAME TITLE LEVEL
1     A John Staff     1

answered Jul 10 at 2:31

Onyambu

77.4k3 gold badges27 silver badges59 bronze badges

Add a comment |

Collectives™ on Stack Overflow

dplyr identify rows that do not have a matched column within group

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged rdplyr or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.