Getting the top n values by group not working as expected





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















Consider the following head(10) of a dataframe:



Consider the following head(10) of a dataframe



It is generated by this dplyr code:



Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn) %>%
head(10)


There are 36 different communities in the "from_infomap" column and 47 different topics in the "topic" column. Grouped by "from_infomap" the number of topics per community, for the first 5 communities, look like this:



enter image description here
I would like to show the top 10 most occurring topics per community, ordered descending. I tried to do that here with:



 group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn)


But if I plot that, it only returns the top 1 topic per community:



enter image description here



I'm not sure what I'm doing wrong. According to this stack overflow query, the weighted top_n(n,wt) function on the count should work, it should give the top 10 topics weighted by their count, grouped by community.



If anyone could perhaps suggest an alternative or point out where I'm going wrong, it would be greatly appreciated. Apologies for the small screenshots, I can't show the entire data.frame here, as it is quite large.



Thanks!



Edit: dput without the group_by, add_count and top_n:



n <- Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap)


dput(head(n,10)):



structure(list(from_infomap = c(1L, 1L, 1L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L), topic = c("KnysnaFire_thanks_wofire", "Abramjee_caperelief_operationsa",
"Pick_n_Pay", "Plett_heavy_rain_snow", "Disasters_help_call",
"KFM_disasters_discussion", "Pick_n_Pay", "Pick_n_Pay", "Pick_n_Pay",
"Pick_n_Pay"), n = c(30512L, 30512L, 30512L, 6572L, 6572L, 6572L,
5030L, 5030L, 5030L, 5030L)), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "from_infomap", drop = TRUE, indices = list(
0:2, 3:5, 6:9), group_sizes = c(3L, 3L, 4L), biggest_group_size = 4L, labels = structure(list(
from_infomap = c(1L, 3L, 4L)), row.names = c(NA, -3L), class = "data.frame", vars = "from_infomap", drop = TRUE))


Issue should be reproducible by adding this code to the previous chunk:



  add_count(topic) %>%
top_n(10,nn) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


Halway-Solution: So with the summarise method suggested by @s_t, we have the following code:



Fuller_list %>% 
as.data.frame() %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(10, nn) %>%
ungroup() %>%
arrange(from_infomap, nn) %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


And this produces: enter image description here



Which is the correct top_n(10) of the various communities. For all practical purposes, the plot now shows the correct data. The only remaining issue is that the arrange does not sort the various topics in desc order per community, but rather overall. Minor issue, would only improve aes if the topics could be arranged per community.










share|improve this question

























  • Could you post some usable data (not an image)?

    – s_t
    Nov 22 '18 at 7:58











  • Here are some tips on how: stackoverflow.com/questions/5963269/…

    – sindri_baldur
    Nov 22 '18 at 7:58











  • @s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

    – Petrus
    Nov 22 '18 at 8:11











  • @snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

    – Petrus
    Nov 22 '18 at 8:14






  • 2





    @s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

    – Petrus
    Nov 22 '18 at 8:21


















0















Consider the following head(10) of a dataframe:



Consider the following head(10) of a dataframe



It is generated by this dplyr code:



Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn) %>%
head(10)


There are 36 different communities in the "from_infomap" column and 47 different topics in the "topic" column. Grouped by "from_infomap" the number of topics per community, for the first 5 communities, look like this:



enter image description here
I would like to show the top 10 most occurring topics per community, ordered descending. I tried to do that here with:



 group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn)


But if I plot that, it only returns the top 1 topic per community:



enter image description here



I'm not sure what I'm doing wrong. According to this stack overflow query, the weighted top_n(n,wt) function on the count should work, it should give the top 10 topics weighted by their count, grouped by community.



If anyone could perhaps suggest an alternative or point out where I'm going wrong, it would be greatly appreciated. Apologies for the small screenshots, I can't show the entire data.frame here, as it is quite large.



Thanks!



Edit: dput without the group_by, add_count and top_n:



n <- Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap)


dput(head(n,10)):



structure(list(from_infomap = c(1L, 1L, 1L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L), topic = c("KnysnaFire_thanks_wofire", "Abramjee_caperelief_operationsa",
"Pick_n_Pay", "Plett_heavy_rain_snow", "Disasters_help_call",
"KFM_disasters_discussion", "Pick_n_Pay", "Pick_n_Pay", "Pick_n_Pay",
"Pick_n_Pay"), n = c(30512L, 30512L, 30512L, 6572L, 6572L, 6572L,
5030L, 5030L, 5030L, 5030L)), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "from_infomap", drop = TRUE, indices = list(
0:2, 3:5, 6:9), group_sizes = c(3L, 3L, 4L), biggest_group_size = 4L, labels = structure(list(
from_infomap = c(1L, 3L, 4L)), row.names = c(NA, -3L), class = "data.frame", vars = "from_infomap", drop = TRUE))


Issue should be reproducible by adding this code to the previous chunk:



  add_count(topic) %>%
top_n(10,nn) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


Halway-Solution: So with the summarise method suggested by @s_t, we have the following code:



Fuller_list %>% 
as.data.frame() %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(10, nn) %>%
ungroup() %>%
arrange(from_infomap, nn) %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


And this produces: enter image description here



Which is the correct top_n(10) of the various communities. For all practical purposes, the plot now shows the correct data. The only remaining issue is that the arrange does not sort the various topics in desc order per community, but rather overall. Minor issue, would only improve aes if the topics could be arranged per community.










share|improve this question

























  • Could you post some usable data (not an image)?

    – s_t
    Nov 22 '18 at 7:58











  • Here are some tips on how: stackoverflow.com/questions/5963269/…

    – sindri_baldur
    Nov 22 '18 at 7:58











  • @s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

    – Petrus
    Nov 22 '18 at 8:11











  • @snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

    – Petrus
    Nov 22 '18 at 8:14






  • 2





    @s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

    – Petrus
    Nov 22 '18 at 8:21














0












0








0








Consider the following head(10) of a dataframe:



Consider the following head(10) of a dataframe



It is generated by this dplyr code:



Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn) %>%
head(10)


There are 36 different communities in the "from_infomap" column and 47 different topics in the "topic" column. Grouped by "from_infomap" the number of topics per community, for the first 5 communities, look like this:



enter image description here
I would like to show the top 10 most occurring topics per community, ordered descending. I tried to do that here with:



 group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn)


But if I plot that, it only returns the top 1 topic per community:



enter image description here



I'm not sure what I'm doing wrong. According to this stack overflow query, the weighted top_n(n,wt) function on the count should work, it should give the top 10 topics weighted by their count, grouped by community.



If anyone could perhaps suggest an alternative or point out where I'm going wrong, it would be greatly appreciated. Apologies for the small screenshots, I can't show the entire data.frame here, as it is quite large.



Thanks!



Edit: dput without the group_by, add_count and top_n:



n <- Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap)


dput(head(n,10)):



structure(list(from_infomap = c(1L, 1L, 1L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L), topic = c("KnysnaFire_thanks_wofire", "Abramjee_caperelief_operationsa",
"Pick_n_Pay", "Plett_heavy_rain_snow", "Disasters_help_call",
"KFM_disasters_discussion", "Pick_n_Pay", "Pick_n_Pay", "Pick_n_Pay",
"Pick_n_Pay"), n = c(30512L, 30512L, 30512L, 6572L, 6572L, 6572L,
5030L, 5030L, 5030L, 5030L)), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "from_infomap", drop = TRUE, indices = list(
0:2, 3:5, 6:9), group_sizes = c(3L, 3L, 4L), biggest_group_size = 4L, labels = structure(list(
from_infomap = c(1L, 3L, 4L)), row.names = c(NA, -3L), class = "data.frame", vars = "from_infomap", drop = TRUE))


Issue should be reproducible by adding this code to the previous chunk:



  add_count(topic) %>%
top_n(10,nn) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


Halway-Solution: So with the summarise method suggested by @s_t, we have the following code:



Fuller_list %>% 
as.data.frame() %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(10, nn) %>%
ungroup() %>%
arrange(from_infomap, nn) %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


And this produces: enter image description here



Which is the correct top_n(10) of the various communities. For all practical purposes, the plot now shows the correct data. The only remaining issue is that the arrange does not sort the various topics in desc order per community, but rather overall. Minor issue, would only improve aes if the topics could be arranged per community.










share|improve this question
















Consider the following head(10) of a dataframe:



Consider the following head(10) of a dataframe



It is generated by this dplyr code:



Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn) %>%
head(10)


There are 36 different communities in the "from_infomap" column and 47 different topics in the "topic" column. Grouped by "from_infomap" the number of topics per community, for the first 5 communities, look like this:



enter image description here
I would like to show the top 10 most occurring topics per community, ordered descending. I tried to do that here with:



 group_by(from_infomap) %>%
add_count(topic) %>%
top_n(10, nn)


But if I plot that, it only returns the top 1 topic per community:



enter image description here



I'm not sure what I'm doing wrong. According to this stack overflow query, the weighted top_n(n,wt) function on the count should work, it should give the top 10 topics weighted by their count, grouped by community.



If anyone could perhaps suggest an alternative or point out where I'm going wrong, it would be greatly appreciated. Apologies for the small screenshots, I can't show the entire data.frame here, as it is quite large.



Thanks!



Edit: dput without the group_by, add_count and top_n:



n <- Fuller_list %>% 
as.data.frame() %>%
select(from_infomap, topic) %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap)


dput(head(n,10)):



structure(list(from_infomap = c(1L, 1L, 1L, 3L, 3L, 3L, 4L, 4L, 
4L, 4L), topic = c("KnysnaFire_thanks_wofire", "Abramjee_caperelief_operationsa",
"Pick_n_Pay", "Plett_heavy_rain_snow", "Disasters_help_call",
"KFM_disasters_discussion", "Pick_n_Pay", "Pick_n_Pay", "Pick_n_Pay",
"Pick_n_Pay"), n = c(30512L, 30512L, 30512L, 6572L, 6572L, 6572L,
5030L, 5030L, 5030L, 5030L)), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "from_infomap", drop = TRUE, indices = list(
0:2, 3:5, 6:9), group_sizes = c(3L, 3L, 4L), biggest_group_size = 4L, labels = structure(list(
from_infomap = c(1L, 3L, 4L)), row.names = c(NA, -3L), class = "data.frame", vars = "from_infomap", drop = TRUE))


Issue should be reproducible by adding this code to the previous chunk:



  add_count(topic) %>%
top_n(10,nn) %>%
ungroup() %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


Halway-Solution: So with the summarise method suggested by @s_t, we have the following code:



Fuller_list %>% 
as.data.frame() %>%
add_count(from_infomap) %>%
filter(from_infomap %in% coms_keep) %>%
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(10, nn) %>%
ungroup() %>%
arrange(from_infomap, nn) %>%
ggplot(aes(x = fct_reorder(topic,nn),y = nn,fill = from_infomap))+
geom_col(width = 1)+
facet_wrap(~from_infomap, scales = "free")+
coord_flip()+
theme(plot.title = element_text("Central Players"),
plot.subtitle= element_text("Top 10 indegree centrality profiles of the 20 biggest communities.n Excluding 'starburst' communities."),
plot.caption = element_text("Source: Twitter"))+
theme_few()


And this produces: enter image description here



Which is the correct top_n(10) of the various communities. For all practical purposes, the plot now shows the correct data. The only remaining issue is that the arrange does not sort the various topics in desc order per community, but rather overall. Minor issue, would only improve aes if the topics could be arranged per community.







r dataframe dplyr






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 11:06







Petrus

















asked Nov 22 '18 at 7:51









PetrusPetrus

286




286













  • Could you post some usable data (not an image)?

    – s_t
    Nov 22 '18 at 7:58











  • Here are some tips on how: stackoverflow.com/questions/5963269/…

    – sindri_baldur
    Nov 22 '18 at 7:58











  • @s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

    – Petrus
    Nov 22 '18 at 8:11











  • @snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

    – Petrus
    Nov 22 '18 at 8:14






  • 2





    @s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

    – Petrus
    Nov 22 '18 at 8:21



















  • Could you post some usable data (not an image)?

    – s_t
    Nov 22 '18 at 7:58











  • Here are some tips on how: stackoverflow.com/questions/5963269/…

    – sindri_baldur
    Nov 22 '18 at 7:58











  • @s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

    – Petrus
    Nov 22 '18 at 8:11











  • @snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

    – Petrus
    Nov 22 '18 at 8:14






  • 2





    @s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

    – Petrus
    Nov 22 '18 at 8:21

















Could you post some usable data (not an image)?

– s_t
Nov 22 '18 at 7:58





Could you post some usable data (not an image)?

– s_t
Nov 22 '18 at 7:58













Here are some tips on how: stackoverflow.com/questions/5963269/…

– sindri_baldur
Nov 22 '18 at 7:58





Here are some tips on how: stackoverflow.com/questions/5963269/…

– sindri_baldur
Nov 22 '18 at 7:58













@s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

– Petrus
Nov 22 '18 at 8:11





@s_t I've added a link to a downloadable sample of the code. I hope this is what you were looking for? drive.google.com/file/d/128R9Vgjd2QsFwHf0M5Yi8ltli2dsDsrJ/…

– Petrus
Nov 22 '18 at 8:11













@snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

– Petrus
Nov 22 '18 at 8:14





@snoram, I would like to use dput to give a sample of the code, but the dataset is quite large (many variables), it won't be practically viewable here. I've provided a link to a 1000 line sample, I hope this is in order?

– Petrus
Nov 22 '18 at 8:14




2




2





@s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

– Petrus
Nov 22 '18 at 8:21





@s_t Of course, I've added the output from dput(head()) below. Is that what you were looking for? Apologies, I'm not very experienced at this yet.

– Petrus
Nov 22 '18 at 8:21












1 Answer
1






active

oldest

votes


















1














Maybe this can help, if I've understood well, you would like to count the topics in each community, select the top(X), and plot them in a decreasing way in each facet:



library(ggplot2)
library(dplyr)

data3 <-
data2 %>%
select(-n) %>% # remove useless column
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(5, nn) # take the top 5 in this case


Now we handle the order, as stated here:



data4 <- data3 %>% 
ungroup() %>%
arrange(from_infomap, nn) %>%
mutate(topic_r = row_number())


Lastly the plot:



ggplot(data4, aes(topic_r, nn,fill = from_infomap)) + 
geom_col() +
facet_wrap(~ from_infomap, scales = "free") +
scale_x_continuous(
breaks = d$topic_r,
labels = d$topic
) +
coord_flip()


enter image description here



I have used some fake data, like these:



data2 <- data.frame(from_infomap =floor(runif(200, 1,5)) ,
topic = sample(letters[1:20], 200, TRUE),
n = floor(runif(200, 10,50)) )


So many topics in communities have the same number, so you do not see only 5 columns.






share|improve this answer
























  • This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

    – Petrus
    Nov 22 '18 at 10:19











  • If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

    – s_t
    Nov 22 '18 at 10:32











  • I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

    – Petrus
    Nov 22 '18 at 11:05











  • Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

    – s_t
    Nov 22 '18 at 11:30













  • I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

    – Petrus
    Nov 22 '18 at 11:49












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426147%2fgetting-the-top-n-values-by-group-not-working-as-expected%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Maybe this can help, if I've understood well, you would like to count the topics in each community, select the top(X), and plot them in a decreasing way in each facet:



library(ggplot2)
library(dplyr)

data3 <-
data2 %>%
select(-n) %>% # remove useless column
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(5, nn) # take the top 5 in this case


Now we handle the order, as stated here:



data4 <- data3 %>% 
ungroup() %>%
arrange(from_infomap, nn) %>%
mutate(topic_r = row_number())


Lastly the plot:



ggplot(data4, aes(topic_r, nn,fill = from_infomap)) + 
geom_col() +
facet_wrap(~ from_infomap, scales = "free") +
scale_x_continuous(
breaks = d$topic_r,
labels = d$topic
) +
coord_flip()


enter image description here



I have used some fake data, like these:



data2 <- data.frame(from_infomap =floor(runif(200, 1,5)) ,
topic = sample(letters[1:20], 200, TRUE),
n = floor(runif(200, 10,50)) )


So many topics in communities have the same number, so you do not see only 5 columns.






share|improve this answer
























  • This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

    – Petrus
    Nov 22 '18 at 10:19











  • If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

    – s_t
    Nov 22 '18 at 10:32











  • I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

    – Petrus
    Nov 22 '18 at 11:05











  • Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

    – s_t
    Nov 22 '18 at 11:30













  • I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

    – Petrus
    Nov 22 '18 at 11:49
















1














Maybe this can help, if I've understood well, you would like to count the topics in each community, select the top(X), and plot them in a decreasing way in each facet:



library(ggplot2)
library(dplyr)

data3 <-
data2 %>%
select(-n) %>% # remove useless column
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(5, nn) # take the top 5 in this case


Now we handle the order, as stated here:



data4 <- data3 %>% 
ungroup() %>%
arrange(from_infomap, nn) %>%
mutate(topic_r = row_number())


Lastly the plot:



ggplot(data4, aes(topic_r, nn,fill = from_infomap)) + 
geom_col() +
facet_wrap(~ from_infomap, scales = "free") +
scale_x_continuous(
breaks = d$topic_r,
labels = d$topic
) +
coord_flip()


enter image description here



I have used some fake data, like these:



data2 <- data.frame(from_infomap =floor(runif(200, 1,5)) ,
topic = sample(letters[1:20], 200, TRUE),
n = floor(runif(200, 10,50)) )


So many topics in communities have the same number, so you do not see only 5 columns.






share|improve this answer
























  • This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

    – Petrus
    Nov 22 '18 at 10:19











  • If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

    – s_t
    Nov 22 '18 at 10:32











  • I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

    – Petrus
    Nov 22 '18 at 11:05











  • Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

    – s_t
    Nov 22 '18 at 11:30













  • I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

    – Petrus
    Nov 22 '18 at 11:49














1












1








1







Maybe this can help, if I've understood well, you would like to count the topics in each community, select the top(X), and plot them in a decreasing way in each facet:



library(ggplot2)
library(dplyr)

data3 <-
data2 %>%
select(-n) %>% # remove useless column
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(5, nn) # take the top 5 in this case


Now we handle the order, as stated here:



data4 <- data3 %>% 
ungroup() %>%
arrange(from_infomap, nn) %>%
mutate(topic_r = row_number())


Lastly the plot:



ggplot(data4, aes(topic_r, nn,fill = from_infomap)) + 
geom_col() +
facet_wrap(~ from_infomap, scales = "free") +
scale_x_continuous(
breaks = d$topic_r,
labels = d$topic
) +
coord_flip()


enter image description here



I have used some fake data, like these:



data2 <- data.frame(from_infomap =floor(runif(200, 1,5)) ,
topic = sample(letters[1:20], 200, TRUE),
n = floor(runif(200, 10,50)) )


So many topics in communities have the same number, so you do not see only 5 columns.






share|improve this answer













Maybe this can help, if I've understood well, you would like to count the topics in each community, select the top(X), and plot them in a decreasing way in each facet:



library(ggplot2)
library(dplyr)

data3 <-
data2 %>%
select(-n) %>% # remove useless column
group_by(from_infomap,topic) %>% # group by the topic and community
summarise(nn = n()) %>% # count the mentioned arguments
top_n(5, nn) # take the top 5 in this case


Now we handle the order, as stated here:



data4 <- data3 %>% 
ungroup() %>%
arrange(from_infomap, nn) %>%
mutate(topic_r = row_number())


Lastly the plot:



ggplot(data4, aes(topic_r, nn,fill = from_infomap)) + 
geom_col() +
facet_wrap(~ from_infomap, scales = "free") +
scale_x_continuous(
breaks = d$topic_r,
labels = d$topic
) +
coord_flip()


enter image description here



I have used some fake data, like these:



data2 <- data.frame(from_infomap =floor(runif(200, 1,5)) ,
topic = sample(letters[1:20], 200, TRUE),
n = floor(runif(200, 10,50)) )


So many topics in communities have the same number, so you do not see only 5 columns.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 22 '18 at 10:00









s_ts_t

3,69121133




3,69121133













  • This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

    – Petrus
    Nov 22 '18 at 10:19











  • If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

    – s_t
    Nov 22 '18 at 10:32











  • I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

    – Petrus
    Nov 22 '18 at 11:05











  • Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

    – s_t
    Nov 22 '18 at 11:30













  • I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

    – Petrus
    Nov 22 '18 at 11:49



















  • This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

    – Petrus
    Nov 22 '18 at 10:19











  • If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

    – s_t
    Nov 22 '18 at 10:32











  • I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

    – Petrus
    Nov 22 '18 at 11:05











  • Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

    – s_t
    Nov 22 '18 at 11:30













  • I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

    – Petrus
    Nov 22 '18 at 11:49

















This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

– Petrus
Nov 22 '18 at 10:19





This is definitely a move in the right direction, I'm getting multiple topics per community now. I'm just getting a plot for every row though, but I'm going to work with this and get back to you with a solution. Thank you so much for your effort!

– Petrus
Nov 22 '18 at 10:19













If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

– s_t
Nov 22 '18 at 10:32





If you remove the useless part in your question ad update it with this answer, we can try to solve everything if there are other issues.

– s_t
Nov 22 '18 at 10:32













I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

– Petrus
Nov 22 '18 at 11:05





I've added a "Halfway-Solution" to the question with your code added. Only remaining issue is the arrange that is applied overall, rather than per community. However, I'm more than happy to have the correct representation of data on the plot already. Thank you very much!

– Petrus
Nov 22 '18 at 11:05













Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

– s_t
Nov 22 '18 at 11:30







Looking at your new code, you have forgot the some steps of my data4 , like mutate(...) that adds the variable to use in the plot, and the scale_x_continuous(...) part in the plot, to order the bars.

– s_t
Nov 22 '18 at 11:30















I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

– Petrus
Nov 22 '18 at 11:49





I tried to add the mutate() for topic_r and then the scale_x_continuous to the plot, but got the error: Discrete value supplied to continuous scale. I tried several solutions, such as scale_x_discrete, or coercing topic_r into a numeric value as suggested by a different stackoverflow question, but didn't succeed.

– Petrus
Nov 22 '18 at 11:49




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426147%2fgetting-the-top-n-values-by-group-not-working-as-expected%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Guess what letter conforming each word

Run scheduled task as local user group (not BUILTIN)

Port of Spain