I've got my dataframe at the moment with the texts, their dates, and the words used in each.
my dataframe
As it is, my plot has a row for each individual text with a list of the words in it, when what I'm after is a row for each key word, mapping the instances in which it appears in texts over time...
This is the kind of thing I'm after, vs what I'm getting...
the type of thing I'm after
how my plot is coming out
Here's a function which I use to return words from my list which are present in a given description:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from pyspark.sql.functions import split, udf
top_ten = [row['word'] for row in top_ten_words]
punctuation = ',?!./\#@"[]'
@udf
def top_ten_list(description):
'''
function to return the key words which are present in a certain description string
'''
list_of_words = description.split(" ")
out_data = []
for word in list_of_words:
word = word.lower()
if word.strip(punctuation) in top_ten:
out_data.append(word)
else:
pass
return str(out_data)
I then run that function on my column with the following:
descriptions_with_top10 = descriptions_by_date.withColumn('top_tenners', top_ten_list(descriptions_by_date.description))
Here's my code so far to make my plot. I know I need something different in the y-axis but I'm just not sure how to word it so as to get what I'm after...
# make df into pandas df to plot with seaborn
top_ten_words_in_descriptions_df = descriptions_with_top10.toPandas()
# plot dispersion plot
top_ten_plot = sns.stripplot(data = top_ten_words_in_descriptions_df, x = "published_date", y = "top_tenners", orient='h', marker='X', color='navy', size=3)
# rotate x tick labels
plt.xticks(rotation=15)
# remove borders of plot
plt.tight_layout()
plt.show()
I want it to have each word just once in the y-axis, as in this image: enter image description here, with each keyword just once, with all its occurrences marked, rather than the list of keywords for each description string.
I hope that makes some sense to someone; any help would be very welcome :)
0 comments:
Post a Comment
Thanks