pd.read_parquet(r'C:\Datasets\cn_data\dm\qmt\wqa_mfeatures\30m\year=2020\month=11\data.parquet')
No error will be reported.
But when I read the directory:
pd.read_parquet(r'C:\Datasets\cn_data\dm\qmt\wqa_mfeatures\30m\year=2020')
An error will be reported
ArrowTypeError: Unable to merge: Field month has incompatible types: int32 vs dictionary
This is because I handcrafted this partitioned path.
It is important that I have to hand craft the partitioned path.
I have 5000 item need transform and write to df.to_parquet(path, partition_cols=['year', 'month']) , and yes it would not overwrite existing files.
* But if I only need rerun 300 item, it would preduce new files, I can't delete the data produce by last run.
* With time goes, I need rerun transform function on new dates(year,month), I need new data overwrite old data.
I just want to keep pd.read_parquet function working with handcraft paths, this can reduce many works on reimplement a similar stuff and refactor pd.read_parquet in many projects.