Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

1547 votes
14 answers
1.8m views

How to join (merge) data frames (inner, outer, left, right)

Given two data frames: df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3))) df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1))) ...
Dan Goldstein's user avatar
251 votes
9 answers
335k views

Reshaping data.frame from wide to long format

I have some trouble to convert my data.frame from a wide table to a long table. At the moment it looks like this: Code Country 1950 1951 1952 1953 1954 AFG Afghanistan 20,249 ...
mropa's user avatar
  • 11.7k
1439 votes
26 answers
2.4m views

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value ...
bigbug's user avatar
  • 58.5k
517 votes
19 answers
998k views

How to sum a variable by group

I have a data frame with two columns. First column contains categories such as "First", "Second", "Third", and the second column has numbers that represent the number of times I saw the specific ...
boo-urns's user avatar
  • 10.3k
3548 votes
19 answers
6.6m views

How do I select rows from a DataFrame based on column values?

How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE column_name = some_value
szli's user avatar
  • 38.5k
271 votes
10 answers
384k views

How do I make a list of data frames?

How do I make a list of data frames and how do I access each of those data frames from the list? For example, how can I put these data frames in a list ? d1 <- data.frame(y1 = c(1, 2, 3), ...
Ben's user avatar
  • 21.1k
4135 votes
34 answers
7.5m views

How can I iterate over rows in a Pandas DataFrame?

I have a pandas dataframe, df: c1 c2 0 10 100 1 11 110 2 12 120 How do I iterate over the rows of this dataframe? For every row, I want to access its elements (values in cells) by the name ...
Roman's user avatar
  • 129k
500 votes
14 answers
848k views

How do I create a new column where the values are selected based on an existing column?

How do I add a color column to the following dataframe so that color='green' if Set == 'Z', and color='red' otherwise? Type Set 1 A Z 2 B Z 3 B X 4 C Y
user7289's user avatar
  • 33.7k
877 votes
12 answers
1.3m views

How to filter Pandas dataframe using 'in' and 'not in' like in SQL

How can I achieve the equivalents of SQL's IN and NOT IN? I have a list with the required values. Here's the scenario: df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']}) ...
LondonRob's user avatar
  • 77.2k
867 votes
15 answers
2.5m views

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I want to filter my dataframe with an or condition to keep rows with a particular column's values that are outside the range [-0.25, 0.25]. I tried: df = df[(df['col'] < -0.25) or (df['col'] > 0....
obabs's user avatar
  • 8,981
181 votes
10 answers
190k views

Dynamically select data frame columns using $ and a character value

I have a vector of different column names and I want to be able to loop over each of them to extract that column from a data.frame. For example, consider the data set mtcars and some variable names ...
Samuel Song's user avatar
  • 2,145
395 votes
11 answers
932k views

How do I Pandas group-by to get sum?

I am using this dataframe: Fruit Date Name Number Apples 10/6/2016 Bob 7 Apples 10/6/2016 Bob 8 Apples 10/6/2016 Mike 9 Apples 10/7/2016 Steve 10 Apples 10/7/2016 Bob 1 Oranges ...
Trying_hard's user avatar
  • 9,441
203 votes
10 answers
232k views

Aggregate / summarize multiple variables per group (e.g. sum, mean)

From a data frame, is there a easy way to aggregate (sum, mean, max etc) multiple variables simultaneously? Below are some sample data: library(lubridate) days = 365*2 date = seq(as.Date("2000-01-...
MikeTP's user avatar
  • 7,956
221 votes
16 answers
155k views

How to unnest (explode) a column in a pandas DataFrame, into multiple rows

I have the following DataFrame where one of the columns is an object (list type cell): df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]}) Output: A B 0 1 [1, 2] 1 2 [1, 2] My ...
BENY's user avatar
  • 322k
680 votes
11 answers
337k views

The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe

R provides two different methods for accessing the elements of a list or data.frame: [] and [[]]. What is the difference between the two, and when should I use one over the other?
Sharpie's user avatar
  • 17.6k

15 30 50 per page
1
2 3 4 5
722