TableRepo
- class parquetranger.TableRepo(root_path: Path | str, max_records: int = 0, group_cols: str | list | HashPartitioner | None = None, env_parents: dict[str, Path | str] | None = None, mkdirs=True, extra_metadata: dict | None = None, drop_group_cols: bool = False, fixed_metadata: dict | None = None, allow_metadata_extension: bool = False)
Bases:
objecthelps with storing, extending and reading tabular data in parquet format
tries dividing based on group_cols, if that is None tries dividing based on max_records, if max records is 0 just writes the file to root_path.parquet
if both group_cols and max_records is given, it will create directories for the groups (nested directories if multiple columns given)
Attributes Summary
Methods Summary
batch_extend(df_iterator, **para_kwargs)env_ctx(env_name)extend(df)get_extending_df_batch_writer([max_records])get_extending_dict_batch_writer([max_records])get_partition_df(partition[, partition_col])get_partition_paths(partition_col)get_partition_table(partition[, partition_col])get_replacing_df_batch_writer([max_records])get_replacing_dict_batch_writer([max_records])map_partitions(fun[, level])mkdirs([force])purge()purges everything
read_df_from_path(path[, lock, release])read_table_from_path(path[, lock, release])replace_all(df)purges everything and writes df instead
replace_groups(df)replace files based on file name, only viable if group_cols is set
replace_records(df[, by_groups])replace records in files based on index
set_env(env)Attributes Documentation
- dfs
- full_metadata
- group_cols
- main_path
- n_files
- paths
- tables
- vc_path
Methods Documentation
- batch_extend(df_iterator, **para_kwargs)
- env_ctx(env_name)
- extend(df: DataFrame)
- get_extending_df_batch_writer(max_records=1000000)
- get_extending_dict_batch_writer(max_records=1000000)
- get_extending_fixed_dict_batch_writer(cols, max_records=1000000)
- get_full_df() DataFrame
- get_full_table() Table
- get_partition_df(partition: str, partition_col: str | None = None) DataFrame
- get_partition_paths(partition_col: str) Iterable[tuple[str, Iterable[Path]]]
- get_partition_table(partition: str, partition_col: str | None = None) Table
- get_replacing_df_batch_writer(max_records=1000000)
- get_replacing_dict_batch_writer(max_records=1000000)
- map_partitions(fun, level=None, **para_kwargs)
- mkdirs(force=False)
- purge()
purges everything
- read_df_from_path(path: Path, lock: allocate_lock | None = None, release=True) DataFrame
- read_table_from_path(path, lock: allocate_lock | None = None, release=True) Table
- replace_all(df: DataFrame)
purges everything and writes df instead
- replace_groups(df: DataFrame)
replace files based on file name, only viable if group_cols is set
- replace_records(df: DataFrame, by_groups=False)
replace records in files based on index
- set_env(env: str)
- set_env_to_default()