StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POmaking matplotlib scatter plots from dataframes in Python's pandas
primarykey
Id
14300137
data
AcceptedAnswerId
14306902
AnswerCount
2
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-01-13T02:38:30.010
FavoriteCount
48
LastActivityDate
2017-10-15T10:28:43.657
LastEditDate
2015-08-31T21:46:32.590
LastEditorUserId
1832942
OwnerUserId
248237
ParentId
0
PostTypeId
1
Score
60
ViewCount
68442
LastEditorDisplayName
text
Body
What is the best way to make a series of scatter plots using <code>matplotlib</code> from a <code>pandas</code> dataframe in Python? For example, if I have a dataframe <code>df</code> that has some columns of interest, I find myself typically converting everything to arrays: <pre><code>import matplotlib.pylab as plt # df is a DataFrame: fetch col1 and col2 # and drop na rows if any of the columns are NA mydata = df[["col1", "col2"]].dropna(how="any") # Now plot with matplotlib vals = mydata.values plt.scatter(vals[:, 0], vals[:, 1]) </code></pre> The problem with converting everything to array before plotting is that it forces you to break out of dataframes. Consider these two use cases where having the full dataframe is essential to plotting: <ol> <li>For example, what if you wanted to now look at all the values of <code>col3</code> for the corresponding values that you plotted in the call to <code>scatter</code>, and color each point (or size) it by that value? You'd have to go back, pull out the non-na values of <code>col1,col2</code> and check what their corresponding values. Is there a way to plot while preserving the dataframe? For example: <pre><code>mydata = df.dropna(how="any", subset=["col1", "col2"]) # plot a scatter of col1 by col2, with sizes according to col3 scatter(mydata(["col1", "col2"]), s=mydata["col3"]) </code></pre></li> <li>Similarly, imagine that you wanted to filter or color each point differently depending on the values of some of its columns. E.g. what if you wanted to automatically plot the labels of the points that meet a certain cutoff on <code>col1, col2</code> alongside them (where the labels are stored in another column of the df), or color these points differently, like people do with dataframes in R. For example: <pre><code>mydata = df.dropna(how="any", subset=["col1", "col2"]) myscatter = scatter(mydata[["col1", "col2"]], s=1) # Plot in red, with smaller size, all the points that # have a col2 value greater than 0.5 myscatter.replot(mydata["col2"] > 0.5, color="red", s=0.5) </code></pre></li> </ol> How can this be done? EDIT Reply to crewbum: You say that the best way is to plot each condition (like <code>subset_a</code>, <code>subset_b</code>) separately. What if you have many conditions, e.g. you want to split up the scatters into 4 types of points or even more, plotting each in different shape/color. How can you elegantly apply condition a, b, c, etc. and make sure you then plot "the rest" (things not in any of these conditions) as the last step? Similarly in your example where you plot <code>col1,col2</code> differently based on <code>col3</code>, what if there are NA values that break the association between <code>col1,col2,col3</code>? For example if you want to plot all <code>col2</code> values based on their <code>col3</code> values, but some rows have an NA value in either <code>col1</code> or <code>col3</code>, forcing you to use <code>dropna</code> first. So you would do: <pre><code>mydata = df.dropna(how="any", subset=["col1", "col2", "col3") </code></pre> then you can plot using <code>mydata</code> like you show -- plotting the scatter between <code>col1,col2</code> using the values of <code>col3</code>. But <code>mydata</code> will be missing some points that have values for <code>col1,col2</code> but are NA for <code>col3</code>, and those still have to be plotted... so how would you basically plot "the rest" of the data, i.e. the points that are not in the filtered set <code>mydata</code>?
Tags
<python><matplotlib><plot><dataframe><pandas>
Title
making matplotlib scatter plots from dataframes in Python's pandas
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMichael Currie
UserOwnerUserId
1. USuser248237dfsf
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POmaking matplotlib scatter plots from dataframes in Python's pandas
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POmaking matplotlib scatter plots from dataframes in Python's pandas
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POmaking matplotlib scatter plots from dataframes in Python's pandas
 UserUserId
 USZelazny7
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.