Convert a dataframe of nearest neighbors to onehot coding

Question

Let's say we took the mtcars data and ran a PCA. Then, we want to know which brands of cars are most similar in PC space, i.e. the nearest neighbors. So someone ran a nearest neighbors analysis and recorded it.

Then, I am given a dataframe that looks like this, with the focal cars as the car column, and the first, and second nearest neighbors, n1, and n2, listed in their own columns.

tibble(car = c("Honda", "Toyota", "Mazda", "Fiat", "Lotus"),
       nn1 = c("Toyota", "Honda", "Toyota", "Lotus", "Mazda"),
       nn2 = c("Mazda", "Mazda", "Honda", "Honda", "Fiat"))
# A tibble: 5 × 3
  car    nn1    nn2  
  <chr>  <chr>  <chr>
1 Honda  Toyota Mazda
2 Toyota Honda  Mazda
3 Mazda  Toyota Honda
4 Fiat   Lotus  Honda
5 Lotus  Mazda  Fiat

I would like to convert this to a one-shot style dataframe, where the 5 focal car brands are the rows, and the columns are the possible neighbors, with each encoded 0 or 1 depending on whether or not it was one of the nearest neighbors to the focal car. So as a tibble, it would look like this:

# A tibble: 5 × 6
  cars   Honda Toyota Mazda  Fiat Lotus
  <chr>  <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Honda      0      1     1     0     0
2 Toyota     1      0     1     0     0
3 Mazda      1      1     0     0     0
4 Fiat       1      0     0     0     1
5 Lotus      0      0     1     1     0

or it could be a dataframe like this:

       Honda Toyota Mazda Fiat Lotus
Honda      0      1     1    0     0
Toyota     1      0     1    0     0
Mazda      1      1     0    0     0
Fiat       1      0     0    0     1
Lotus      0      0     1    1     0

Gregor Thomas · Accepted Answer · 2024-07-08 20:59:32Z

3

More of an adjacency matrix than a one-hot encoding matrix. Calling your data df:

library(tidyr)
library(dplyr)
df |>
  pivot_longer(-car) |>
  mutate(fill = 1) |>
  pivot_wider(id_cols = car, names_from = value, values_from = fill, values_fill = 0)
# # A tibble: 5 × 6
#   car    Toyota Mazda Honda Lotus  Fiat
#   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Honda       1     1     0     0     0
# 2 Toyota      0     1     1     0     0
# 3 Mazda       1     0     1     0     0
# 4 Fiat        0     0     1     1     0
# 5 Lotus       0     1     0     0     1

answered Jul 8 at 20:59

Gregor Thomas

144k22 gold badges179 silver badges305 bronze badges

1

adjacency matrix is a new one for me, thanks for the clean answer!
– jnat
Commented Jul 8 at 21:05

Add a comment |

M-- · Accepted Answer · 2024-07-08 21:40:44Z

3

as.data.frame.matrix(table(reshape2::melt(df, id = 1)[-2]))

#>        Fiat Honda Lotus Mazda Toyota
#> Fiat      0     1     1     0      0
#> Honda     0     0     0     1      1
#> Lotus     1     0     0     1      0
#> Mazda     0     1     0     0      1
#> Toyota    0     1     0     1      0

answered Jul 8 at 21:40

M--

28.2k10 gold badges69 silver badges101 bronze badges

Recognized by R Language Collective

Add a comment |

LMc · Accepted Answer · 2024-07-09 14:55:52Z

From the package fastDummies the function dummy_cols() will create these columns if you concatenate them first:

library(fastDummies)
library(tidyr)

df |>
  unite("dummy", starts_with("nn"), remove = F) |>
  dummy_cols("dummy", split = "_", omit_colname_prefix = T, remove_selected_columns = T)

Output

  car    nn1    nn2   Honda Mazda Lotus  Fiat Toyota
  <chr>  <chr>  <chr> <int> <int> <int> <int>  <int>
1 Honda  Toyota Mazda     0     1     0     0      1
2 Toyota Honda  Mazda     1     1     0     0      0
3 Mazda  Toyota Honda     1     0     0     0      1
4 Fiat   Lotus  Honda     1     0     1     0      0
5 Lotus  Mazda  Fiat      0     1     0     1      0

ThomasIsCoding · Accepted Answer · 2024-07-10 21:48:16Z

3

Maybe you can try table like below

> with(df, table(rep(car, each = ncol(df) - 1), t(df[-1])))

         Fiat Honda Lotus Mazda Toyota
  Fiat      0     1     1     0      0
  Honda     0     0     0     1      1
  Lotus     1     0     0     1      0
  Mazda     0     1     0     0      1
  Toyota    0     1     0     1      0

or

> with(df, table(data.frame(x = car, y = c(nn1, nn2))))
        y
x        Fiat Honda Lotus Mazda Toyota
  Fiat      0     1     1     0      0
  Honda     0     0     0     1      1
  Lotus     1     0     0     1      0
  Mazda     0     1     0     0      1
  Toyota    0     1     0     1      0

or as @thelatemail suggested in the comment

> table(cbind(df[1], unlist(df[-1])))
        unlist(df[-1])
car      Fiat Honda Lotus Mazda Toyota
  Fiat      0     1     1     0      0
  Honda     0     0     0     1      1
  Lotus     1     0     0     1      0
  Mazda     0     1     0     0      1
  Toyota    0     1     0     1      0

edited Jul 10 at 21:48

answered Jul 8 at 21:13

ThomasIsCoding

93.5k8 gold badges31 silver badges90 bronze badges

3

Variation on a theme table(cbind(df[1], unlist(df[-1]))) or table(cbind(df[1], neighbour=unlist(df[-1]))) if you want nice names.
– thelatemail
Commented Jul 8 at 21:33
2

@thelatemail if you post that, I know that I will upvote.
– M--
Commented Jul 8 at 21:43
@thelatemail yes, thanks for your contribution, which seems more concise :)
– ThomasIsCoding
Commented Jul 8 at 21:45

Add a comment |

Collectives™ on Stack Overflow

Convert a dataframe of nearest neighbors to onehot coding

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
r
dataframe
dplyr
tidyr
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged rdataframedplyrtidyr or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dataframe
dplyr
tidyr
or ask your own question.