data = [("definitely somewhere",), ("Las Vegas",), ("其他",), (None,), ("",), ("Pucela Madrid Langreo, España",), ("Trenches, With Egbon Adugbo",)]
df = spark.createDataFrame(data, ["address"])
city_country = {
'las vegas': 'US',
'lagos': 'NG',
'España': 'ES'
}
cities_name_to_code = spark.sparkContext.broadcast(city_country )
df_with_codes = df.withColumn('cities_array', F.lower(F.col('address'))) \
.withColumn('cities_array', F.split(F.col('cities_array'), ', '))
I want to find in cities_array all the keys from cities_name_to_code for each element (get an array of values).
The problem is that I don't want to use UDF.
0 comments:
Post a Comment
Thanks