π§ Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes π§
Reads all the batch resource file(s) into a list of (Polars) DataFrames.
This function takes the Parquet file(s) given by paths, reads them in as Polars DataFrames as a list and does some checks on each of the DataFrames in the list based on the resource_properties. The resource_properties object is used to check the data and ensure it is correct. While Sprout generally assumes that the files stored in the resources/<id>/batch/ folder are already correctly structured and tidy, this function still runs checks to ensure the data are correct by comparing to the properties.
Parameters
resource_properties:ResourceProperties
The ResourceProperties object that contains the properties of the resource you want to check the data against.
paths:list[Path] | None=None
A list of paths for all the files in the resourceβs batch/ folder. Use path_resource_batch_files() to help provide the correct paths to the batch files. Defaults to the batch files of the given resource.
Returns
list[pl.DataFrame]
Outputs a list of DataFrame objects from all the batch files.
Raises
FileNotFoundError
If a file in the list of paths doesnβt exist.
ValueError
If the batch file name is not in the expected pattern.
ValueError
If the timestamp column name matches an existing column in the DataFrame.