Tuva EMPI Backend

The backend for Tuva EMPI. It consists of a Django API and PostgreSQL database.

Development

Installation

Inside the dev Docker container, you can run:

cd backend
make install-all
make migrate
make bootstrap
make run-dev

Testing and formatting

Run type checking: make check
Run type checking and tests: make test
Run formatter: make format
Run linter: make lint

Testing with localstack S3

When running locally, if you'd like to use the /person-records/import API endpoint in order to test things, you can use localstack via the awslocal command.

Example

In a dev container terminal:

Run the API dev server: make run-dev

Then, in another dev container terminal cd backend and run:

Create a bucket: aws s3api create-bucket --bucket tuva-health-local
Upload a person records file: aws s3 cp main/tests/resources/tuva_synth/tuva_synth_clean.csv s3://tuva-health-local/raw-person-records.csv
Open a web browser on the host and visit localhost:9000
Sign-in (there is an initial test user created with username user and password test1234)
Once signed in you might see a 502 bad gateway if the frontend web server isn't running. That's okay, we just want to open the dev tools, go to cookies and copy the OAuth2 Proxy cookies (e.g. _oauth2_proxy) into environment variables in the dev container: export AUTH_COOKIE="...". It's possible that there is more than one cookie.
POST to the config endpoint with cookie: http -v oauth2-proxy:9000/api/v1/config splink_settings:=@main/tests/resources/tuva_synth/tuva_synth_model.json potential_match_threshold:=0.5 auto_match_threshold:=1 "Cookie:_oauth2_proxy=$AUTH_COOKIE"
POST to the import endpoint: http -v oauth2-proxy:9000/api/v1/person-records/import s3_uri=s3://tuva-health-local/raw-person-records.csv config_id=cfg_1 "Cookie:_oauth2_proxy=$AUTH_COOKIE"

Processing jobs

Running the matching service so it can process jobs: make worker

Connecting to the DB directly

When running locally, you can connect to the PGSQL database from a dev container terminal via: PGPASSWORD=tuva_empi psql -h db -U tuva_empi

Clearing the database

If you'd like to start from scratch with Postgres, either delete the Docker volume where the PGSQL data is stored or drop and recreate the database:

PGPASSWORD=tuva_empi psql -h db -U tuva_empi postgres
drop database tuva_empi
create database tuva_empi

Testing with AWS Cognito

If you'd like to connect to AWS Cognito, you will need to create an AWS account and setup a test AWS Cognito user pool. For the callback URL in cognito, use http://localhost:9000/oauth2/callback.

If you are switching from Keycloak, it's best to remove all containers (cd .devcontainer && docker compose down) and clear your database by removing the DB volume.

Then locally, you will need to disable localstack and load your AWS config.

In .env:

Uncomment AWS_CONFIG_FILE
Comment out AWS_ENDPOINT_URL, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Modify OAUTH2_PROXY_OIDC_ISSUER_URL, OAUTH2_PROXY_SCOPE, OAUTH2_PROXY_CLIENT_ID and OAUTH2_PROXY_CLIENT_SECRET to match your AWS Cognito test user pool settings (AWS Cognito doesn't support offline_access scope)

In local.json:

Set idp.backend to aws-cognito
Set cognito_user_pool_id, jwks_url and client_id based on your AWS Cognito test user pool settings
Set initial_setup.admin_email to a valid user's email address from your test user pool

Then you can rebuild and reopen the backend docker container.

In the dev container:

Set AWS profile: export AWS_PROFILE=...
Login to AWS: aws sso login
Check your identity: aws sts get-caller-identity
Then follow the steps in Installation to run migrations and bootstrap

Migrations

To re-apply migrations from scratch:

Clear the database: python manage.py flush
Undo migrations: python manage.py migrate main zero
Apply migrations: make migrate app=main

To add a new auto migration:

Make changes to the models
python manage.py makemigrations

To add a new manual migration:

Create a new migration file in the migrations directory

To run migrations:

make migrate app=main

Config

Currently, there are a couple config environments:

local (local development)
ci (Github Actions)

But really, an environment is just a name. And the only one that has any special meaning is "local", because settings.py sets certain things if the env is set to "local".

Contributing

Django ORM

In general, prefer raw SQL to the Django ORM for complex queries and queries involving a join. For complex queries, it's more clear to stick with a single mental model (SQL) and it's often more performant. We can craft the query exactly how we want it to be and easily review that it is how we want it to be. For queries involving joins, the Django ORM doesn't provide the most intuitive interface. For example, select_related switches between LEFT OUTER and INNER join based on different conditions: https://github.com/django/django/blob/56e23b2319cc29e6f8518f8f21f95a530dddb930/django/db/models/sql/query.py#L1121-L1133. This seems like an easy way to introduce a critical bug. Data integrity is extremely important in this application, so let's not muddy the waters with complex abstractions. For simple create/get/filter/count/update queries, I think the Django ORM is fine and can be easier to read.

Development​

Installation​

Testing and formatting​

Testing with localstack S3​

Example​

Processing jobs​

Connecting to the DB directly​

Clearing the database​

Testing with AWS Cognito​

Migrations​

Config​

Contributing​

Django ORM​