CoderFunda
  • Home
  • About us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • About us
  • Home
  • Php
  • HTML
  • CSS
  • JavaScript
    • JavaScript
    • Jquery
    • JqueryUI
    • Stock
  • SQL
  • Vue.Js
  • Python
  • Wordpress
  • C++
    • C++
    • C
  • Laravel
    • Laravel
      • Overview
      • Namespaces
      • Middleware
      • Routing
      • Configuration
      • Application Structure
      • Installation
    • Overview
  • DBMS
    • DBMS
      • PL/SQL
      • SQLite
      • MongoDB
      • Cassandra
      • MySQL
      • Oracle
      • CouchDB
      • Neo4j
      • DB2
      • Quiz
    • Overview
  • Entertainment
    • TV Series Update
    • Movie Review
    • Movie Review
  • More
    • Vue. Js
    • Php Question
    • Php Interview Question
    • Laravel Interview Question
    • SQL Interview Question
    • IAS Interview Question
    • PCS Interview Question
    • Technology
    • Other

27 September, 2023

How do I create a lexical dispersion plot in pyspark with seaborn?

 Programing Coderfunda     September 27, 2023     No comments   

I'm trying to make a lexical dispersion plot that maps out when words in a certain 'top ten' list are used over time through a column of strings in a Pysparks data frame (another column has their relevant dates).


I've got my dataframe at the moment with the texts, their dates, and the words used in each.


my dataframe


As it is, my plot has a row for each individual text with a list of the words in it, when what I'm after is a row for each key word, mapping the instances in which it appears in texts over time...


This is the kind of thing I'm after, vs what I'm getting...


the type of thing I'm after


how my plot is coming out


Here's a function which I use to return words from my list which are present in a given description:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from pyspark.sql.functions import split, udf

top_ten = [row['word'] for row in top_ten_words]

punctuation = ',?!./\#@"[]'

@udf
def top_ten_list(description):
'''
function to return the key words which are present in a certain description string
'''
list_of_words = description.split(" ")
out_data = []

for word in list_of_words:
word = word.lower()
if word.strip(punctuation) in top_ten:
out_data.append(word)
else:
pass

return str(out_data)




I then run that function on my column with the following:
descriptions_with_top10 = descriptions_by_date.withColumn('top_tenners', top_ten_list(descriptions_by_date.description))



Here's my code so far to make my plot. I know I need something different in the y-axis but I'm just not sure how to word it so as to get what I'm after...
# make df into pandas df to plot with seaborn
top_ten_words_in_descriptions_df = descriptions_with_top10.toPandas()

# plot dispersion plot
top_ten_plot = sns.stripplot(data = top_ten_words_in_descriptions_df, x = "published_date", y = "top_tenners", orient='h', marker='X', color='navy', size=3)

# rotate x tick labels
plt.xticks(rotation=15)

# remove borders of plot
plt.tight_layout()

plt.show()




I want it to have each word just once in the y-axis, as in this image: enter image description here, with each keyword just once, with all its occurrences marked, rather than the list of keywords for each description string.


I hope that makes some sense to someone; any help would be very welcome :)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook

Related Posts:

  • How to get Customer Collection in Magento 2 How to get Customer Collection in Magento 2Customer Collection offers various benefits to store admin. But the most noticeable advantage is… Read More
  • Magento 2 Get All Payment Methods Magento 2 Get All Payment MethodsIn Magento 2, there will be 3 types of payment methods that you will be fetching which are All Payment Met… Read More
  • 4 Steps to add Custom Tab in Customer Account in Magento 2 4 Steps to add Custom Tab in Customer Account in Magento 2In this article, I will introduce how to add Custom Tab in Customer Account in Magento… Read More
  • Magento 2 Javascript Bundling - Group/Combine JS files Magento 2 Javascript Bundling - Group/Combine JS filesToday, I will talk about Magento 2 JavaScript Bundling - Group/Combine JS files.Main conte… Read More
  • Disable a Payment Method Programmatically in Magento 2 Disable a Payment Method Programmatically in Magento 2Payment system management can be considered as one of the cores of the Magento 2 store sys… Read More
Newer Post Older Post Home

0 comments:

Post a Comment

Thanks

Meta

Popular Posts

  • Failed to install 'cordova-plugin-firebase': CordovaError: Uh oh
    I had follow these steps to install an configure firebase to my cordova project for cloud messaging. https://medium.com/@felipepucinelli/how...
  • Spring boot app (error: method getFirst()) failed to run at local machine, but can run on server
    The Spring boot app can run on the online server. Now, we want to replicate the same app at the local machine but the Spring boot jar file f...
  • Log activity in a Laravel app with Spatie/Laravel-Activitylog
      Requirements This package needs PHP 8.1+ and Laravel 9.0 or higher. The latest version of this package needs PHP 8.2+ and Laravel 8 or hig...
  • Step-by-step guide to linking gnuplot to Octave within Virtual Studio Code (VSC)
    I am aware of a number of previous questions (here, here and here for example) pointing out to the need to modify a file named .octaverc. ...
  • Laravel auth login with phone or email
          <?php     Laravel auth login with phone or email     <? php     namespace App \ Http \ Controllers \ Auth ;         use ...

Categories

  • Ajax (26)
  • Bootstrap (30)
  • DBMS (42)
  • HTML (12)
  • HTML5 (45)
  • JavaScript (10)
  • Jquery (34)
  • Jquery UI (2)
  • JqueryUI (32)
  • Laravel (1017)
  • Laravel Tutorials (23)
  • Laravel-Question (6)
  • Magento (9)
  • Magento 2 (95)
  • MariaDB (1)
  • MySql Tutorial (2)
  • PHP-Interview-Questions (3)
  • Php Question (13)
  • Python (36)
  • RDBMS (13)
  • SQL Tutorial (79)
  • Vue.js Tutorial (68)
  • Wordpress (150)
  • Wordpress Theme (3)
  • codeigniter (108)
  • oops (4)
  • php (853)

Social Media Links

  • Follow on Twitter
  • Like on Facebook
  • Subscribe on Youtube
  • Follow on Instagram

Pages

  • Home
  • Contact Us
  • Privacy Policy
  • About us

Blog Archive

  • September (100)
  • August (50)
  • July (56)
  • June (46)
  • May (59)
  • April (50)
  • March (60)
  • February (42)
  • January (53)
  • December (58)
  • November (61)
  • October (39)
  • September (36)
  • August (36)
  • July (34)
  • June (34)
  • May (36)
  • April (29)
  • March (82)
  • February (1)
  • January (8)
  • December (14)
  • November (41)
  • October (13)
  • September (5)
  • August (48)
  • July (9)
  • June (6)
  • May (119)
  • April (259)
  • March (122)
  • February (368)
  • January (33)
  • October (2)
  • July (11)
  • June (29)
  • May (25)
  • April (168)
  • March (93)
  • February (60)
  • January (28)
  • December (195)
  • November (24)
  • October (40)
  • September (55)
  • August (6)
  • July (48)
  • May (2)
  • January (2)
  • July (6)
  • June (6)
  • February (17)
  • January (69)
  • December (122)
  • November (56)
  • October (92)
  • September (76)
  • August (6)

  • Failed to install 'cordova-plugin-firebase': CordovaError: Uh oh - 9/21/2024
  • pyspark XPath Query Returns Lists Omitting Missing Values Instead of Including None - 9/20/2024
  • SQL REPL from within Python/Sqlalchemy/Psychopg2 - 9/20/2024
  • MySql Explain with Tobias Petry - 9/20/2024
  • How to combine information from different devices into one common abstract virtual disk? [closed] - 9/20/2024

Laravel News

  • Track Metrics Effortlessly with Laravel's Context Increment and Decrement Methods - 5/4/2025
  • NativePHP Hit $100K — And We're Just Getting Started 🚀 - 5/8/2025
  • Name Queued Closures in Laravel 12.13 - 5/9/2025
  • Simplify HasManyThrough Relationships with Laravel's CanBeOneOfMany Support - 5/4/2025
  • Using Database Comments to Track Columns With Sensitive Data - 5/7/2025

Copyright © 2025 CoderFunda | Powered by Blogger
Design by Coderfunda | Blogger Theme by Coderfunda | Distributed By Coderfunda