TableRepo

Bases: object

helps with storing, extending and reading tabular data in parquet format

tries dividing based on group_cols, if that is None tries dividing based on max_records, if max records is 0 just writes the file to root_path.parquet

if both group_cols and max_records is given, it will create directories for the groups (nested directories if multiple columns given)

Attributes Summary

`dfs`
`full_metadata`
`group_cols`
`main_path`
`n_files`
`paths`
`tables`
`vc_path`

Methods Summary

`batch_extend`(df_iterator, **para_kwargs)
`env_ctx`(env_name)
`extend`(df)
`get_extending_df_batch_writer`([max_records])
`get_extending_dict_batch_writer`([max_records])
`get_extending_fixed_dict_batch_writer`(cols)
`get_full_df`()
`get_full_table`()
`get_partition_df`(partition[, partition_col])
`get_partition_paths`(partition_col)
`get_partition_table`(partition[, partition_col])
`get_replacing_df_batch_writer`([max_records])
`get_replacing_dict_batch_writer`([max_records])
`map_partitions`(fun[, level])
`mkdirs`([force])
`purge`()	purges everything
`read_df_from_path`(path[, lock, release])
`read_table_from_path`(path[, lock, release])
`replace_all`(df)	purges everything and writes df instead
`replace_groups`(df)	replace files based on file name, only viable if group_cols is set
`replace_records`(df[, by_groups])	replace records in files based on index
`set_env`(env)
`set_env_to_default`()

Attributes Documentation

dfs

full_metadata

group_cols

main_path

n_files

paths

tables

vc_path

Methods Documentation

batch_extend(df_iterator, **para_kwargs)

env_ctx(env_name)

extend(df: DataFrame)

get_extending_df_batch_writer(max_records=1000000)

get_extending_dict_batch_writer(max_records=1000000)

get_extending_fixed_dict_batch_writer(cols, max_records=1000000)

get_full_df() → DataFrame

get_full_table() → Table

get_partition_df(partition: str, partition_col: str | None = None) → DataFrame

get_partition_paths(partition_col: str) → Iterable[tuple[str, Iterable[Path]]]

get_partition_table(partition: str, partition_col: str | None = None) → Table

get_replacing_df_batch_writer(max_records=1000000)

get_replacing_dict_batch_writer(max_records=1000000)

map_partitions(fun, level=None, **para_kwargs)

mkdirs(force=False)

purge(): purges everything

read_df_from_path(path: Path, lock: allocate_lock | None = None, release=True) → DataFrame

read_table_from_path(path, lock: allocate_lock | None = None, release=True) → Table

replace_all(df: DataFrame): purges everything and writes df instead

replace_groups(df: DataFrame): replace files based on file name, only viable if group_cols is set

replace_records(df: DataFrame, by_groups=False): replace records in files based on index

set_env(env: str)

set_env_to_default()