Tuva EMPI Backend
The backend for Tuva EMPI. It consists of a Django API and PostgreSQL database.
Development
Installation
Inside the dev Docker container, you can run:
cd backend
make install-all
make migrate
make bootstrap
make run-dev
Testing and formatting
- Run type checking:
make check
- Run type checking and tests:
make test
- Run formatter:
make format
- Run linter:
make lint
Testing with localstack S3
When running locally, if you'd like to use the /person-records/import API endpoint in order to test things, you can use localstack
via the awslocal
command.
Example
In a dev container terminal:
- Run the API dev server:
make run-dev
Then, in another dev container terminal cd backend
and run:
- Create a bucket:
aws s3api create-bucket --bucket tuva-health-local
- Upload a person records file:
aws s3 cp main/tests/resources/tuva_synth/tuva_synth_clean.csv s3://tuva-health-local/raw-person-records.csv
- Open a web browser on the host and visit
localhost:9000
- Sign-in (there is an initial test user created with username
user
and passwordtest1234
) - Once signed in you might see a 502 bad gateway if the frontend web server isn't running. That's okay, we just want to open the dev tools, go to cookies and copy the OAuth2 Proxy cookies (e.g.
_oauth2_proxy
) into environment variables in the dev container:export AUTH_COOKIE="..."
. It's possible that there is more than one cookie. - POST to the config endpoint with cookie:
http -v oauth2-proxy:9000/api/v1/config splink_settings:=@main/tests/resources/tuva_synth/tuva_synth_model.json potential_match_threshold:=0.5 auto_match_threshold:=1 "Cookie:_oauth2_proxy=$AUTH_COOKIE"
- POST to the import endpoint:
http -v oauth2-proxy:9000/api/v1/person-records/import s3_uri=s3://tuva-health-local/raw-person-records.csv config_id=cfg_1 "Cookie:_oauth2_proxy=$AUTH_COOKIE"
Processing jobs
- Running the
matching service
so it can process jobs:make worker
Connecting to the DB directly
When running locally, you can connect to the PGSQL database from a dev container terminal via: PGPASSWORD=tuva_empi psql -h db -U tuva_empi
Clearing the database
If you'd like to start from scratch with Postgres, either delete the Docker volume where the PGSQL data is stored or drop and recreate the database:
PGPASSWORD=tuva_empi psql -h db -U tuva_empi postgres
drop database tuva_empi
create database tuva_empi
Testing with AWS Cognito
If you'd like to connect to AWS Cognito, you will need to create an AWS account and setup a test AWS Cognito user pool. For the callback URL in cognito, use http://localhost:9000/oauth2/callback
.
If you are switching from Keycloak, it's best to remove all containers (cd .devcontainer && docker compose down
) and clear your database by removing the DB volume.
Then locally, you will need to disable localstack and load your AWS config.
In .env
:
- Uncomment
AWS_CONFIG_FILE
- Comment out
AWS_ENDPOINT_URL
,AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- Modify
OAUTH2_PROXY_OIDC_ISSUER_URL
,OAUTH2_PROXY_SCOPE
,OAUTH2_PROXY_CLIENT_ID
andOAUTH2_PROXY_CLIENT_SECRET
to match your AWS Cognito test user pool settings (AWS Cognito doesn't supportoffline_access
scope)
In local.json
:
- Set
idp.backend
toaws-cognito
- Set
cognito_user_pool_id
,jwks_url
andclient_id
based on your AWS Cognito test user pool settings - Set
initial_setup.admin_email
to a valid user's email address from your test user pool
Then you can rebuild and reopen the backend docker container.
In the dev container:
- Set AWS profile:
export AWS_PROFILE=...
- Login to AWS:
aws sso login
- Check your identity:
aws sts get-caller-identity
- Then follow the steps in
Installation
to run migrations and bootstrap
Migrations
To re-apply migrations from scratch:
- Clear the database:
python manage.py flush
- Undo migrations:
python manage.py migrate main zero
- Apply migrations:
make migrate app=main
To add a new auto migration:
- Make changes to the models
python manage.py makemigrations
To add a new manual migration:
- Create a new migration file in the migrations directory
To run migrations:
make migrate app=main
Config
Currently, there are a couple config environments:
- local (local development)
- ci (Github Actions)
But really, an environment is just a name. And the only one that has any special meaning is "local", because settings.py sets certain things if the env is set to "local".
Contributing
Django ORM
In general, prefer raw SQL to the Django ORM for complex queries and queries involving a join. For complex queries, it's more clear to stick with a single mental model (SQL) and it's often more performant. We can craft the query exactly how we want it to be and easily review that it is how we want it to be. For queries involving joins, the Django ORM doesn't provide the most intuitive interface. For example, select_related
switches between LEFT OUTER and INNER join based on different conditions: https://github.com/django/django/blob/56e23b2319cc29e6f8518f8f21f95a530dddb930/django/db/models/sql/query.py#L1121-L1133. This seems like an easy way to introduce a critical bug. Data integrity is extremely important in this application, so let's not muddy the waters with complex abstractions. For simple create/get/filter/count/update queries, I think the Django ORM is fine and can be easier to read.