Skip to main content
The 2024 Developer Survey results are live! See the results
626

How to write a good R question with a reproducible example

Created
Active
Last edited
Viewed 3k times
2 min read
Part of R Language Collective
41

This article is largely written based on https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

When posting a question to Stack Overflow, you should include a minimal reproducible example (MRE) which enables others to exactly reproduce your issue on their machines.

In short:
  1. Provide a minimal dataset, necessary to demonstrate the problem, either by copying and pasting it into your script or preferably by using the dput() function to generate an R code to recreate it.
  2. Share a minimal runnable code necessary to reproduce the issue, including the list of packages, which can be tested on the given dataset.
  3. Describe your desired output in a clear and concise manner.
Here is an example:

How to get the conditional sum of two columns?

I am trying to get the sum of two columns in a data frame, with two additional conditions:

  • returning zero for the rows that have NA in one of the columns
  • return NA if both of the columns are NA for any specific rows.

Here's what I have so far:

# Required packages
library(dplyr)
library(tidyr)

# Data
df1 <- data.frame(x = c(1, 2, NA, 4, NA), y = c(6, NA, 8, 9, NA))

# Adding a column with the sum of x and y
df1 %>%
  mutate(sum = replace_na(x + y, 0))

#>    x  y  sum
#> 1  1  6    7
#> 2  2 NA    0
#> 3 NA  8    0
#> 4  4  9   13
#> 5 NA NA    0

How can I apply an if-statement to get my desired output as shown below?

    x        y      sum
    1        6        7
    2       NA        0
   NA        8        0
    4        9       13
   NA       NA       NA

There are a few other things that can improve your question:
  • Choose a clear and explanatory title for your question. Review How do I write a good title?
  • Keep your example, both data and code, minimal while making sure it is reproducing the behavior, errors, etc. that you are seeing at your end.
  • Include comments within your code to explain what it is that you are trying to achieve.
  • Format your question properly.
  • Use appropriate tags. Remember to at least read the mouseover tooltip text on the tags you are using when asking a question.
  • If you are using functions that are generating random numbers, use set.seed() to ensure reproducibility.
  • If you have issues that are specific to your environment, include the version of R that you are using, the operating system that you are running on, and any other relevant information about your environment. Providing the output of sessionInfo() is helpful for these cases.
  • You can also create a reproducible example with the help of reprex package.
Moreover, you should avoid:
  • Including details that are not necessary to address your issue.
  • Pasting your code, data, or errors as images.
  • Sharing your code or data through external links.
  • Asking a question before doing any research. Questions that show no attempt at solving the problem, are already answered here on Stack Overflow, and/or can easily be addressed through a simple search on the Internet tend to get negative feedback.
  • Combining multiple questions into one post which would make your post too broad.
  • Asking questions focused on statistics or data science. While there is some overlap between Stack Overflow and other technical communities, if your question is not about practical programming, you should explore other Stack Exchange communities, read their help pages, and consider posting to them instead of Stack Overflow.
Here are some additional helpful resources:
anon
1
  • 4
    Disclaimer: I am the original author of this article, but had it disassociated from my account to avoid getting reps from it as this has been pinned and can cause inorganic reputation gain.
    – M--
    Commented Jan 11 at 21:09