Here is the sample data and code I'm working with:
data = [
(1, """
Lion
Apple
Banana
Tiger
Cranberry
"""),
(2, """
Lion
Apple
Tiger
Banana
Zebra
""")
df = spark.createDataFrame(data, ["id", "xml_string"])
What the XPath queries return:
For data column:
(1, ["Apple","Banana","Cranberry"], ["Lion","Tiger"])
(2, ["Apple","Banana"], ["Lion","Tiger","Zebra"])
What I want:
For data column:
(1, ["Apple","Banana","Cranberry"], ["Lion", None, "Tiger"])
(2, ["Apple","Banana", None], ["Lion","Tiger","Zebra"])
How can I adjust my XPath queries?
root/level1/level2/level3/level4/data
root/level1/level2/level3/data2
0 comments:
Post a Comment
Thanks