VOCData#

Welcome to VOCData, the VOC Dataset Management System! This project is designed to facilitate the management and retrieval of datasets containing data about Volatile Organic Compounds (VOCs) and their subclasses. It aims to become a central source of information about worldwide VOC data, subsequently aiming to answer interesting questions related to VOC data, e.g.:

  • Geographical over or under representations of areas

  • Analysis of specific VOCs for datasets

  • Quick retrieval of contact info for datasets

  • and many more…

Our system offers a comprehensive API to handle various operations related to datasets, VOCs, VOC subclasses, research sites, and associated contacts and publications. You can find the full documentation here: Documentation

Key Features#

Dataset Management

  • Create Datasets: Add new datasets to the system.

  • Retrieve Datasets: Fetch all datasets or filter datasets by specific criteria such as site, country, VOC subclass, or geographic area.

  • Dataset Association: Link datasets to VOC subclasses and sites, allowing for hierarchical data organization and retrieval.

VOC and VOC Subclass Management

  • Create VOCs: Add new VOCs to the system.

  • Retrieve VOCs: Fetch all VOCs or filter VOCs by subclass name.

  • Add Relationships: add relationships between VOCs and VOC subclasses, or hierarchical relationships between subclasses.

  • VOC Subclasses: Manage and retrieve VOC subclasses, including retrieving all subclasses associated with a specific VOC and hierarchical VOC Subclass relationships.

Site Management

  • Create Sites: Add new research sites where data is collected.

  • Retrieve Sites: Fetch all research sites or filter sites within a specific geographic area.

Contact Management

  • Create Contacts: Add new contacts associated with datasets.

  • Retrieve Contacts: Fetch all contacts or retrieve contacts associated with a specific dataset.

Publication Management

  • Create Publications: Add new publications referencing datasets.

  • Retrieve Publications: Fetch all publications in the system, retrieve publications for a specific dataset.

Quickstart#

The easiest way to get started with VOCData is via docker-compose. This will start 2 containers, one for the database and one for the backend providing the API. Swagger UI automatically generates API docs that are available under <your_host>:<your_port>/docs (localhost:80/docs) as soon as the app started. It is suggested to checkout the API docs to get familiar with the project and its functions. To get a better understanding of the underlying datastructure, check out this model.

$ docker compose up --build

Development Environment#

There are two approaches to develop this project locally, either the “classic” way by developing on your local machine, or directly inside a container using dev containers.

Devcontainers#

Developing from within a dev container eliminates discrepancies between your development environment (usually your local machine) and the container environment, therefore eliminating related issues (e.g. non matching python versions). It fully relies on Docker and an IDE that supports development from inside a container (e.g. VSC or PyCharm). Dev containers rely on Microsofts dev container specification. The dev container configuration can be found in .devcontainer/backend/devcontainer.json. While this is the fastest way to set up a local dev environment, it misses some functionality in comparison to the regular approach (e.g. ease of debugging).

Local Development#

Prerequisites:

  • Python @3.10 and pip

$ python --version              3.10.x
$ pip --version
$ pip install pre-commit
$ cd VOCData
$ pre-commit install

Setup the repository:

# Clone the repo
$ git clone https://github.com/lukagerlach/VOCData.git
# navigate to backend folder
# cd backend
# Create virtual environment named venv
$ python -m venv venv
# Active environment
$ \venv\Scripts\activate            Windows
$ source venv/bin/activate          Unix based OS
# Install dependencies
$ pip install -r requirements.txt

Startup application:

To startup the database, it is suggested to make use of docker compose, but only start the database container. Therefore, just run:

$ docker compose up db --build

Your database is now exposed to your local machine on the port specified in the compose.yaml. The database is built from a postgis image, to be able to natively handle geo-spatial data.

Before starting up the FastAPI backend app, it is necessary to configure the database connection. While this is automatically handled by docker if we run the backend in a container, running it locally requires some extra setup. Therefore, create a database.env file and put in the following variable:

POSTGRES_SERVER=localhost

This way, your backend will now try to find the database on your local machine, not inside the docker network. Since this is a FastAPI App, just run the following command to start your backend:

$ fastapi run app/main.py --port 80 --reload

Your backend now runs on port 80 of your local machine. to check the API docs call http://localhost:80/docs

Useful Resources#

This project builds upon a lot of libraries, tools and technologies. To get a better understanding of how it works, these resources might be helpful:

Docker Docs

Docker Compose Docs

FastApi Docs

OpenApi Spec

Pydantic

Sqlmodel

SqlAlchemy

Postgis

Pre-commit

Sphinx

PGAdmin