Creating and managing properties in a data package

Warning

🚧 Sprout is still in active development and evolving quickly, so the documentation and functionality may not work as described and could undergo substantial changes 🚧

At the core of Sprout is the β€œdata package”, which is a standardized way of structuring and documenting data. This guide will show you how to create and manage data packages using Sprout.

Important

Sprout assumes you have control over your system’s files and folders, or at least your user’s home directory. This includes having access to a server through the Terminal where you can write to specific folders.

Creating a data package

Sprout is designed assuming the working directory is the root of your Python projectβ€”where your .git/ folder and/or pyproject.toml file are located.

Right now, your file structure should look a bit like:

πŸ“ diabetes-study/
β”œβ”€πŸ“„ .gitignore
β”œβ”€πŸ“„ .python-version
β”œβ”€πŸ“„ README.md
β”œβ”€πŸ“„ main.py
β””β”€πŸ“„ pyproject.toml

Now, time to make a data package! A data package always needs a datapackage.json file in the root of the project. It contains a set of properties, or metadata about the data package and the data resources within it. Right now, you don’t have a datapackage.json file yet. Before you can create one, you need to write out the properties for the data package.

While you could manually create a datapackage.json file and then add the properties manually to it yourself, this would require that you know the exact fields and values to fill in. To ease this process, Sprout provides a way to create the properties using Python scripts. So let’s create this script first, by adding it as a step in the main.py file!

The main.py file is where you will write the code to create and manage your data package, so that you can easily recreate your data package and its resources. You can think of it as a pipeline that takes you from beginning to end of creating your data package. Open the main.py file in your Python project, delete everything in it and copy and paste the below code into it.

import seedcase_sprout as sp

def main():
    # Create the properties script in the default location.
    sp.create_properties_script()

if __name__ == "__main__":
    main()

Then, run this command in the Terminal:

Terminal
uv run main.py

This will create a properties.py file in the newly created scripts/ folder of your data package.

Caution

Because of the way Python scripts and importing works, you should also create an __init__.py file in the scripts/ folder. You can do this by running the following command in your Terminal:

Terminal
touch scripts/__init__.py

The file structure should now look like:

πŸ“ diabetes-study/
β”œβ”€πŸ“ scripts/
β”‚ β”œβ”€πŸ“„ __init__.py
β”‚ β””β”€πŸ“„ properties.py
β”œβ”€πŸ“„ .gitignore
β”œβ”€πŸ“„ .python-version
β”œβ”€πŸ“„ README.md
β”œβ”€πŸ“„ main.py
β””β”€πŸ“„ pyproject.toml

Inside the scripts/properties.py file, you will find a template for creating the properties of your data package. It looks like:

import seedcase_sprout as sp

# from .resource_properties import resource_properties

properties = sp.PackageProperties(
    ## Required:
    name="diabetes-study",
    title="",
    description="",
    licenses=[
        sp.LicenseProperties(
            ## Required:
            name="",
            ## Optional:
            # path="",
            # title="",
        ),
    ],
    ## Optional:
    # homepage="",
    # contributors=[
    #    sp.ContributorProperties(
    #        ## Required:
    #        title="",
    #        ## Optional:
    #        path="",
    #        email="",
    #        given_name="",
    #        family_name="",
    #        organization="",
    #        roles=[""],
    #    ),
    # ],
    # keywords=[""],
    # image="",
    # sources=[
    #    sp.SourceProperties(
    #        ## Required:
    #        title="",
    #        ## Optional:
    #        path="",
    #        email="",
    #        version="",
    #    ),
    # ],
    # resources=[
    #     resource_properties,
    # ],
    ## Autogenerated:
    id="757dae38-55bf-4f99-b33d-961018db26a7",
    version="0.1.0",
    created="2025-06-25T14:07:27+00:00",
)

You can now start filling in that script by using the comments included, such as making sure to fill in the required fields, as well as using the documentation for the PackageProperties to know what to write. The core of a data package is its properties. Without these, your data are simply a collection of files without any context or meaning. The metadata (properties) are crucially important for understanding and actually using the data in your data package!

To help with writing the properties, Sprout includes several properties data classes, such as PackageProperties, LicenseProperties, and ContributorProperties, to make it easier for you to make properties with the correct fields filled in. See the PackageProperties documentation for more details about these data classes.

For now, let’s write some basic properties for your data package. Below is an example of a set of properties with required fields filled in (including the optional but highly recommended contributors field).

properties = sp.PackageProperties(
    name="diabetes-study",
    title="A Study on Diabetes",
    # You can write Markdown below, with the helper `sp.dedent()`.
    description=sp.dedent("""
        # Data from a 2021 study on diabetes prevalence

        This data package contains data from a study conducted in 2021 on the
        *prevalence* of diabetes in various populations. The data includes:

        - demographic information
        - health metrics
        - survey responses about lifestyle
        """),
    contributors=[
        sp.ContributorProperties(
            title="Jamie Jones",
            email="jamie_jones@example.com",
            path="example.com/jamie_jones",
            roles=["creator"],
        )
    ],
    licenses=[
        sp.LicenseProperties(
            name="ODC-BY-1.0",
            path="https://opendatacommons.org/licenses/by",
            title="Open Data Commons Attribution License 1.0",
        )
    ],
    # We don't include the rest of the properties script in this guide. The above is only to
    # show what it might look to write properties in the script.
)

Now that you’ve filled in some of the package properties, it’s time to create your datapackage.json file with these properties. You can use the write_properties() function for this. But in order to use it effectively, it’s best included within the main.py script within the main() function, so that it can be run as part of your build pipeline. So in your main.py, include this code:

import seedcase_sprout as sp
from scripts.properties import properties

def main():
    # Create the properties script in default location.
    sp.create_properties_script()
    # Write properties from properties script to `datapackage.json`.
    sp.write_properties(properties=properties)

if __name__ == "__main__":
    main()

Then, run this command in the Terminal:

Terminal
uv run main.py
Important

The write_properties() function will give an error if the PackageProperties object is missing some of its required fields or if they are not filled in correctly. In that case, a datapackage.json file won’t be created. So you will have to return to the scripts/properties.py file and fill in the correct properties.

The write_properties() function created the datapackage.json file in your data package’s diabetes-study folder, which contains the properties you added to it. Now, you will see the added datapackage.json file in your data package folder.

πŸ“ diabetes-study/
β”œβ”€πŸ“ scripts/
β”‚ β”œβ”€πŸ“„ __init__.py
β”‚ β””β”€πŸ“„ properties.py
β”œβ”€πŸ“„ .gitignore
β”œβ”€πŸ“„ .python-version
β”œβ”€πŸ“„ README.md
β”œβ”€πŸ“„ datapackage.json
β”œβ”€πŸ“„ main.py
β””β”€πŸ“„ pyproject.toml

Creating a README of the properties

Having a human-readable version of what is contained in the datapackage.json file is useful for others who may be working with or wanting to learn more about your data package. You can use as_readme_text() to convert the properties into text that can be added to a README file. Let’s create a README file with the properties of the data package you just created by writing it in the main.py file.

import seedcase_sprout as sp
from scripts.properties import properties

def main():
    # Create the properties script in default location.
    sp.create_properties_script()
    # Save the properties to `datapackage.json`.
    sp.write_properties(properties=properties)
    # Create text for a README of the data package.
    readme_text = sp.as_readme_text(properties)
    # Write the README text to a `README.md` file.
    sp.write_file(readme_text, sp.PackagePath().readme())

if __name__ == "__main__":
    main()

Sprout splits the README creation functionality into two steps: One to make the text and one to write to the file. That way, if you want to add or manipulate the text, you can do so before writing it to the file. This is useful if you want to add information to the README that you don’t want included in the datapackage.json file. For this guide we won’t cover how or why to do this.

Next, run this command in the Terminal to make the README file. The write_file() will always overwrite the existing README file.

Terminal
uv run main.py

Now you can see that the README.md file has been created in your data package:

πŸ“ diabetes-study/
β”œβ”€πŸ“ scripts/
β”‚ β”œβ”€πŸ“„ __init__.py
β”‚ β””β”€πŸ“„ properties.py
β”œβ”€πŸ“„ .gitignore
β”œβ”€πŸ“„ .python-version
β”œβ”€πŸ“„ README.md
β”œβ”€πŸ“„ datapackage.json
β”œβ”€πŸ“„ main.py
β””β”€πŸ“„ pyproject.toml

Editing package properties

If you made a mistake and want to update the properties in the current datapackage.json, you can edit the Python script directly where you previously made the properties. Since everything is written in Python scripts, updating those scripts and re-running your build pipeline (main.py) will then update everything.

If you need help with filling in the right properties, see the documentation for the PackageProperties classes or run e.g., print(sp.PackageProperties()) to get a list of all the fields you can fill in for a package.

You now have the basic starting point for adding data resources to your data package.