05 December, 2023

Find a value from an array column in a dictionary Pyspark

Programing Coderfunda December 05, 2023 No comments

I have this dataframe in Pyspark:
data = [("definitely somewhere",), ("Las Vegas",), ("其他",), (None,), ("",), ("Pucela Madrid Langreo, España",), ("Trenches, With Egbon Adugbo",)]
df = spark.createDataFrame(data, ["address"])
city_country = {
'las vegas': 'US',
'lagos': 'NG',
'España': 'ES'
}
cities_name_to_code = spark.sparkContext.broadcast(city_country )
df_with_codes = df.withColumn('cities_array', F.lower(F.col('address'))) \
.withColumn('cities_array', F.split(F.col('cities_array'), ', '))

I want to find in cities_array all the keys from cities_name_to_code for each element (get an array of values).
The problem is that I don't want to use UDF.

05 December, 2023

Find a value from an array column in a dictionary Pyspark

0 comments:

Post a Comment

Meta

Popular Posts

Categories

Social Media Links

Pages

Blog Archive

Laravel News