import seedcase_sprout as sp
from textwrap import dedent
# For pretty printing of output
from pprint import pprint
= sp.PackageProperties(
package_properties ="woolly-dormice",
name="Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
title=dedent('''
description This scoping review explores the hibernation physiology of the
woolly dormouse, drawing on data collected over a 10-year period
along the Taurus Mountain range in Turkey.
'''
),id="123-abc-123",
="2014-05-14T05:00:01+00:00",
created="1.0.0",
version=[sp.LicenseProperties(name="odc-pddl")],
licenses )
Checking package and resource properties
🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧
The structure and content of package and resource properties follow the Frictionless Data Package standard. This standard defines the available fields for each type of metadata (called properties), specifies which fields are required, and describes the allowed values for each field. In addition to these specifications, Sprout introduces its own structural and formatting requirements.
To make it easy to check metadata against all these requirements, Sprout provides a set of check functions. Each check function takes a properties object (PackageProperties
or ResourceProperties
) as input, runs all necessary checks, and raises a group of CheckError
s if any checks fail (run help(CheckError)
for more details). Each CheckError
corresponds to a specific violated requirement. If certain error types are not relevant to your use case, you can configure the check functions to ignore these.
Checking package properties
A set of package properties with only the required fields filled in might look like:
To check that these properties are indeed complete and well-formed, we use the check_package_properties()
function. Since all required fields (e.g. name
, description
, and title
) are filled out and have the correct format, the function will not raise any errors and will return the original input:
= sp.check_package_properties(package_properties)
package_properties pprint(package_properties)
PackageProperties(name='woolly-dormice',
id='123-abc-123',
title='Hibernation Physiology of the Woolly Dormouse: A '
'Scoping Review.',
description='\n'
'This scoping review explores the hibernation '
'physiology of the\n'
'woolly dormouse, drawing on data collected over '
'a 10-year period\n'
'along the Taurus Mountain range in Turkey.\n',
homepage=None,
version='1.0.0',
created='2014-05-14T05:00:01+00:00',
contributors=None,
keywords=None,
image=None,
licenses=[LicenseProperties(name='odc-pddl',
path=None,
title=None)],
resources=None,
sources=None)
Now, let’s say we didn’t include a description
in our package properties. Running the check again, we see that an error is raised, alerting us that this is a required field. See the end of the output:
= sp.PackageProperties(
package_properties ="woolly-dormice",
name="Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
titleid="123-abc-123",
="2014-05-14T05:00:01+00:00",
created="1.0.0",
version=[sp.LicenseProperties(name="odc-pddl")],
licenses
) sp.check_package_properties(package_properties)
+ Exception Group Traceback (most recent call last):
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3667, in run_code
| exec(code_obj, self.user_global_ns, self.user_ns)
| File "/tmp/ipykernel_2370/1332289238.py", line 9, in <module>
| sp.check_package_properties(package_properties)
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 33, in check_package_properties
| return _generic_check_properties(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 166, in _generic_check_properties
| raise ExceptionGroup(
| ExceptionGroup: The following checks failed on the properties:
| PackageProperties(name='woolly-dormice', id='123-abc-123', title='Hibernation Physiology of the Woolly Dormouse: A Scoping Review.', description=None, homepage=None, version='1.0.0', created='2014-05-14T05:00:01+00:00', contributors=None, keywords=None, image=None, licenses=[LicenseProperties(name='odc-pddl', path=None, title=None)], resources=None, sources=None) (1 sub-exception)
+-+---------------- 1 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.description` caused by `required`: 'description' is a required property
+------------------------------------
Finally, let’s say there are multiple errors in our package properties. For example, we forgot the description
and gave a name
containing special characters, which doesn’t meet the expected format. In this case, both errors will be listed in the output:
= sp.PackageProperties(
package_properties ="Woolly Dormice (Toros Dağları)",
name="Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
titleid="123-abc-123",
="2014-05-14T05:00:01+00:00",
created="1.0.0",
version=[sp.LicenseProperties(name="odc-pddl")],
licenses
) sp.check_package_properties(package_properties)
+ Exception Group Traceback (most recent call last):
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3667, in run_code
| exec(code_obj, self.user_global_ns, self.user_ns)
| File "/tmp/ipykernel_2370/3771139548.py", line 9, in <module>
| sp.check_package_properties(package_properties)
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 33, in check_package_properties
| return _generic_check_properties(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 166, in _generic_check_properties
| raise ExceptionGroup(
| ExceptionGroup: The following checks failed on the properties:
| PackageProperties(name='Woolly Dormice (Toros Dağları)', id='123-abc-123', title='Hibernation Physiology of the Woolly Dormouse: A Scoping Review.', description=None, homepage=None, version='1.0.0', created='2014-05-14T05:00:01+00:00', contributors=None, keywords=None, image=None, licenses=[LicenseProperties(name='odc-pddl', path=None, title=None)], resources=None, sources=None) (2 sub-exceptions)
+-+---------------- 1 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.description` caused by `required`: 'description' is a required property
+---------------- 2 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.name` caused by `pattern`: 'Woolly Dormice (Toros Dağları)' does not match '^[a-z0-9._-]+$'
+------------------------------------
Note that check_package_properties()
is for checking that a set of package properties is well-formed without considering whether any associated resource properties are well-formed. This function will only run checks on package properties fields and not look at resource properties fields.
If your aim is to check whether your properties as a whole meet the Frictionless Data Package standard, you need to check your package and resource properties as a unit. See the Checking a full set of properties section for detailed instructions.
As an example, in the package properties below, there is a set of resource properties with all required fields missing. This will be ignored when the check runs:
= sp.PackageProperties(
package_properties ="woolly-dormice",
name="Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
title=dedent('''
description This scoping review explores the hibernation physiology of the
woolly dormouse, drawing on data collected over a 10-year period
along the Taurus Mountain range in Turkey.
'''
),id="123-abc-123",
="2014-05-14T05:00:01+00:00",
created="1.0.0",
version=[sp.LicenseProperties(name="odc-pddl")],
licenses=[sp.ResourceProperties()],
resources
) pprint(sp.check_package_properties(package_properties))
PackageProperties(name='woolly-dormice',
id='123-abc-123',
title='Hibernation Physiology of the Woolly Dormouse: A '
'Scoping Review.',
description='\n'
'This scoping review explores the hibernation '
'physiology of the\n'
'woolly dormouse, drawing on data collected over '
'a 10-year period\n'
'along the Taurus Mountain range in Turkey.\n',
homepage=None,
version='1.0.0',
created='2014-05-14T05:00:01+00:00',
contributors=None,
keywords=None,
image=None,
licenses=[LicenseProperties(name='odc-pddl',
path=None,
title=None)],
resources=[ResourceProperties(name=None,
path=None,
type=None,
title=None,
description=None,
sources=None,
licenses=None,
format=None,
mediatype=None,
encoding=None,
bytes=None,
hash=None,
schema=None)],
sources=None)
Checking resource properties
To check that a set of resource properties are complete and well-formed, we use the check_resource_properties()
function.
In the resource properties below, the required fields title
and description
are missing, and name
doesn’t match the expected format. An error will be raised for each problem:
= sp.ResourceProperties(
resource_properties ="Woolly Dormice (2015, Toros Dağları)",
name
) sp.check_resource_properties(resource_properties)
+ Exception Group Traceback (most recent call last):
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3667, in run_code
| exec(code_obj, self.user_global_ns, self.user_ns)
| File "/tmp/ipykernel_2370/776683067.py", line 4, in <module>
| sp.check_resource_properties(resource_properties)
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 110, in check_resource_properties
| raise error_info
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 102, in check_resource_properties
| _generic_check_properties(
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 166, in _generic_check_properties
| raise ExceptionGroup(
| ExceptionGroup: The following checks failed on the properties:
| PackageProperties(name=None, id=None, title=None, description=None, homepage=None, version=None, created=None, contributors=None, keywords=None, image=None, licenses=None, resources=[ResourceProperties(name='Woolly Dormice (2015, Toros Dağları)', path=None, type=None, title=None, description=None, sources=None, licenses=None, format=None, mediatype=None, encoding=None, bytes=None, hash=None, schema=None)], sources=None) (4 sub-exceptions)
+-+---------------- 1 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.description` caused by `required`: 'description' is a required property
+---------------- 2 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.name` caused by `pattern`: 'Woolly Dormice (2015, Toros Dağları)' does not match '^[a-z0-9._-]+$'
+---------------- 3 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.path` caused by `required`: 'path' is a required property
+---------------- 4 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.title` caused by `required`: 'title' is a required property
+------------------------------------
Checking a full set of properties
When we want to check both package and resource properties, we can use the check_properties()
function. In the properties below, we expect three checks to fail: the required description
field is missing in both the package and resource properties, and the resource path
doesn’t point to a data file. When we run the check, all three errors are listed:
= sp.PackageProperties(
properties ="woolly-dormice",
name="Hibernation Physiology of the Woolly Dormouse: A Scoping Review.",
titleid="123-abc-123",
="2014-05-14T05:00:01+00:00",
created="1.0.0",
version=[sp.LicenseProperties(name="odc-pddl")],
licenses=[sp.ResourceProperties(
resources="woolly-dormice-2015",
name="Body fat percentage in the hibernating woolly dormouse",
title="https://en.wikipedia.org/wiki/Woolly_dormouse"
path
)],
) sp.check_properties(properties)
+ Exception Group Traceback (most recent call last):
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3667, in run_code
| exec(code_obj, self.user_global_ns, self.user_ns)
| File "/tmp/ipykernel_2370/3672494957.py", line 14, in <module>
| sp.check_properties(properties)
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 74, in check_properties
| _generic_check_properties(properties)
| File "/home/runner/work/seedcase-sprout/seedcase-sprout/src/seedcase_sprout/check_properties.py", line 166, in _generic_check_properties
| raise ExceptionGroup(
| ExceptionGroup: The following checks failed on the properties:
| PackageProperties(name='woolly-dormice', id='123-abc-123', title='Hibernation Physiology of the Woolly Dormouse: A Scoping Review.', description=None, homepage=None, version='1.0.0', created='2014-05-14T05:00:01+00:00', contributors=None, keywords=None, image=None, licenses=[LicenseProperties(name='odc-pddl', path=None, title=None)], resources=[ResourceProperties(name='woolly-dormice-2015', path='resources/woolly-dormice-2015/data.parquet', type=None, title='Body fat percentage in the hibernating woolly dormouse', description=None, sources=None, licenses=None, format=None, mediatype=None, encoding=None, bytes=None, hash=None, schema=None)], sources=None) (2 sub-exceptions)
+-+---------------- 1 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.description` caused by `required`: 'description' is a required property
+---------------- 2 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.resources[0].description` caused by `required`: 'description' is a required property
+------------------------------------
Understanding error messages
Let’s have a closer look at the (end of the) error message we got in the previous section:
...
| raise ExceptionGroup(
| ExceptionGroup: The following checks failed on the properties:
| PackageProperties(name='woolly-dormice', id='123-abc-123', title='Hibernation Physiology of the Woolly Dormouse: A Scoping Review.', description=None, homepage=None, version='1.0.0', created='2014-05-14T05:00:01+00:00', contributors=None, keywords=None, image=None, licenses=[LicenseProperties(name='odc-pddl', path=None, title=None)], resources=[ResourceProperties(name='woolly-dormice-2015', path='https://en.wikipedia.org/wiki/Woolly_dormouse', type=None, title='Body fat percentage in the hibernating woolly dormouse', description=None, sources=None, licenses=None, format=None, mediatype=None, encoding=None, bytes=None, hash=None, dialect=None, schema=None)], sources=None) (3 sub-exceptions)
+-+---------------- 1 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.description` caused by `required`: 'description' is a required property
+---------------- 2 ----------------
| seedcase_sprout.check_datapackage.check_error.CheckError: Error at `$.resources[0].description` caused by `required`: 'description' is a required property
+------------------------------------
First, the offending properties are printed, followed by the list of errors. Error messages are composed of the following parts:
- “seedcase_sprout.check_datapackage.check_error.CheckError”: the class representing the error. Check functions will always throw
CheckError
s for failing checks. - “Error at
$.resources[0].description
”: the location of the error in the properties object.$
corresponds to the topmost layer of the object (the root);.resources
points to theresources
field of this layer;[0]
indicates that the error is in the 0th (i.e. the first, counting from 0) resource properties object; and.description
means that thedescription
field of this resource properties is at fault. - “caused by
required
”: the kind of requirement or expectation that was violated. Here, the expectation is that the field should be present (i.e. it is a required field). - “‘description’ is a required property”: a longer, human-readable explanation of the error.