This is the repository for submitting the project for course 2. The v*_project-submission releases are the intended code versions for submission. Please choose the latest one.
This project is based on the starter project provided at https://github.com/udacity/nd064-c2-message-passing-projects-starter. From this, the following code has been copied / modified:
- frontend application (applications/udaconnect-ms-frontend): minor modifications in
DockerfileandConnection.jsandPerson.jsAPI URLs. Otherwise unchanged. - database models for sqlalchemy (build/types/models.py) with major modifications for refactoring
The Kafka Helm Charts to deploy the Kafka server have been copied unchanged from https://github.com/bitnami/charts/tree/main/bitnami/kafka (licenced unde the Apache License 2.0) except for some minor modifications in the values file.
- applications: source code for all refactored custom microservices
- build/types: common models, schemas and proto files shared with all applications. These files are copied within the ci-workflow to avoid duplicates and allow central management of all message definitions.
- .github/workflows/app-build.yaml: github ci-workflow to build and push applications to docker hub registry hub.docker.com/u/omarnava
- argocd: argocd application config files. Follows the
app of appsschema. - charts: helm charts for Kafka (copied from bitnami/kafka)
- manifests: kubernetes declarative deployment manifests for all other applications
- docs: additional documentation required for project submission
Assuming you have kubernetes, kubectl and argocd ready:
- Configure volumes on host: pgadmin and kafka need specific user ids on host to access mounted volumes from their containers.
mkdir -p /mnt/data/postgres
mkdir -p /mnt/data/pgadmin
mkdir -p /mnt/data/kafka
chown -R 5050:5050 /mnt/data/pgadmin
chown -R 1001:1001 /mnt/data/kafka - Deploy with ArgoCD:
kubectl apply -f argocd/udaconnect.yaml- Sync applications:
- For this project, auto-sync is not enabled so you need to sync the apps manually.
- The ArgoCD setup follows the App of Apps Pattern. The udaconnect app acts as the parent app that needs to be synced first.
- You can then sync all the other apps. I would recommend to sync configuration, postgres and kafka apps first as the other apps depend on them.
-
Available endpoints after cluster startup:
- The following endpoints are externally available on your host system:
- localhost:30100: pgAdmin Frontend
- localhost:30101: Udaconnect Frontend
- localhost:30102: LocationAPI
- localhost:30103: PersonAPI
- The following endpoints are cluster-internal only:
- postgres-svc:5432: PostgreSQL database
- kafka:9092: Kafka server
- person-service-svc:5005: gRPC server
- The following endpoints are externally available on your host system:
-
Generate some data:
- Generate some mock persons using the PersonAPI endpoint (cf. postman collection). The interactive SwaggerUI provides you with some mock data you can directly
post. I would recommend to add at least two different persons. - Generate some mock location data using the LocationAPI endpoint (cf. postman collection). Again, you can use the interactive SwaggerUI to
postsome data. Alternatively, you can start mock clients that generate some random location data for particularpersons:
- Generate some mock persons using the PersonAPI endpoint (cf. postman collection). The interactive SwaggerUI provides you with some mock data you can directly
python applications/udaconnect-test-clients/client.py -i PERSON_IDNote that due to the async database update (location-service) and the async connections calculation (exposure-service) it may take two or three minutes until you see the data in the frontend.
The deployment has been tested with k3s and ArgoCD v2.8.0 using an Ubuntu 22.04.3 LTS Server environment.
Major architecture design decisions:
- In the given starter setup, both person and location endpoints were merged into one central API. The business requirements state, however, that the location endpoint should be fit for large ingests of data. I therefore decided to refactor both endpoints into separate services which can be - thanks to kubernetes - independently scaled:
- location-api: This public endpoint can be used for location data ingestion. The REST API serves as type-checking gateway before messages are put to a Kafka message queue. The
postendpoint returns an201once a message has been put to the queue. No further calculation or database operation are required toposta new location message. Consequently, the per-message load on this instance is rather small. In a nutshell, the LocationAPI is a Kafka publisher with a REST-gateway upfront. - person-api: This public REST API contains all the endpoints for the frontend. We can expect frequent but way fewer requests to this API than compared to the LocationAPI. Thanks to the separation between both services, the PersonAPI's liveness does not depend on the LocationAPI in case of unforeseen, exceptionally high location data requests. The PersonAPI implements a
gRPCclient to precisely define internal message formats and passing with its companionperson-servicemodule. Aspersonandconnectionobjects are more complex thanlocationmessage schemas, we can take advantage ofgRPC's type enforcements.
- location-api: This public endpoint can be used for location data ingestion. The REST API serves as type-checking gateway before messages are put to a Kafka message queue. The
- In the starter setup, each
connectionrequest requires an online calculation of the closest encounter of each person to all other persons recorded. This operation is computationally expensive and potentially slow once we have manypersonsand manylocationsrecorded in our database. Besides, the starter setup puts the heavy computations to the postgres instance challenging its availability for other incoming requests. The business requirements state that people use Udaconnect to check which other persons they met on conferences around the world. Based on this we can make the following assumptions to refactor the starter setup:- Conferences are usually longer events stretching over one or even more days. As a user of Udaconnect I know that the last conference I attended was on day X. I am interested in all my encounters on that day, no matter whether I met them at 2pm or 4pm assuming that I hardly can attend two different conference at two different places on a single day. This means it is sufficient to consider only day-wise slices of our location data.
- When I am on a conference I am focused on my experiences there and probably don't bother checking who I met 5min ago. Typically, I want to check my encounters when I am on the way back, sitting in a plane or a train reviewing my conference notes. This means it is sufficient to provide current but not real-time connection information. If I am checking my connections for yesterday's conference it doesn't matter whether these results were calculated a second or an hour ago. It actually won't even change the result. We can therefore decouple the connection request from the connection calculation.
- Given the above considerations, the heavy computation and data handling parts are offloaded to three background services:
- exposure-service: This routine regularlily queries the
locationtable for all location data for the current date to then calculate the distances between all geolocation points for all persons. The computation outcome is stored in a separateexposuretable with an entry for each person-pair storing their minimum distance, the associated location ids and the current date. Querying thisexposuretable allows to retrieve all connections that are within a certain time window and below a certain distance threshold - which is way faster than calculating the distances at request-time. Also, the distance calculation is not done in the postgres module but in the exposure service container, separating data storing and data computations more strictly. The cycle time for the exposure calculation routine is set to 60s for better testing but could be set to one hour for productive use. In the end, one has to decide how frequent the exposure data should be updated and how heavy the computational load for the exposure service should be. But thanks to the separation this tradeoff can now be made. - location-service: The location-api puts new new location data in the Kafka queue without bothering about storing. This is what the location-service does. It implements a Kafka consumer for the
locationstopic and regularily (but not too frequently) polls the queue for new location data. Again, we do not need to live-update our location database but we just want to make sure that we do not miss anything. Thanks to the queuing technology, the location-api can quickly ingest new data. The location service asynchroneously polls batches of location messages from the queue and inserts them to the database. Inserting hundrets of messages as batch is way faster than doing hunderts of single database inserts. - person-service: This service implements the gRPC server that talks to the person-api client and establishes the database connection to retrieve/insert person-related data (new persons or connections). Again, we can separate and individually scale the database-read/write and the data-input/output functionalities.
- exposure-service: This routine regularlily queries the
The separation into different microservices requires that all participants use the same message models and schemas. To avoid duplicate definitions that are prone to inconsistencies, all schemas for REST, gRPC and database provisioning are centrally stored and updated in build/types. The CI-workflow to build the application-images copies these schemas at build-time. For the very same reason, the python gRPC objects for person-service and person-api are generated at build-time, too.
The following architecture diagram shows the choosen microservice setup as explained above:
Architecture decisions for each module are also described in detail in docs/architecture_decisions.
The refactored applications implements two REST APIs. The OpenAPI documentation is auto-generated using FastAPI Interactive Docs based on the Pydantic Models in build/types/schemas.py.
Once you have deployed the application, the interactive SwaggerUI is available at the /docs endpoint for both APIs. You can also find a copy of both docs in docs/openapi.
- LocationAPI: localhost:30102/docs (or docs/openapi/LocationAPI.json)
- PersonAPI: localhost:30103/docs (or docs/openapi/PersonAPI.json)
A Postman collection of all API endpoints is provided in docs/postman.json.
All gRPC messages and services are defined in build/types/definitions.proto.
The *_pb2 and *_pb2_grpc files are not part of the application source code as they are generated during the image build process (cf. Dockerfiles for person-service and person-api apps and compile script).
The gRPC server is launched in the person-service main module.
The gRPC client is launched in the person-api main module.
See also docs/grpc for further details.
- Use secrets instead of configmaps for database credentials
- Use encrypted communication for Kafka messaging
- Specifiy resource requests for all pods
- Add readiness- and liveness-probes for pods
- Open Kafka/gRPC endpoints for external clients

