Skip to content

Task monitor daemon process may limit scalabilty #194

@uniqueg

Description

@uniqueg

Is your feature request related to a problem? Please describe.

Currently updating the run status in the database involves sending a Celery signal that is picked up by a single task monitor daemon process that is spawned by the main application. As status updates may be numerous if many workflow runs are managed in parallel and status updates may furthermore contain long log messages, this architecture may impose a serious bottleneck for scaling up run throughput.

Describe the solution you'd like

To improve scalability, status updates could be handled by worker processes instead. A status update could be posted to the broker queue and picked up by a worker rather than the task monitor in order to update the database. To ensure that ongoing workflow runs do not block status updates (effectively causing the service to be stuck indefinitely), a dedicated worker pool of at least size would need to be set aside for this purpose.

Describe alternatives you've considered

As an alternative to setting aside a dedicated worker pool for status updates, status updates could also be handled directly by the worker processes that are already handling the workflow runs.

Additional context

It is important that the chosen solution will be conceptually compatible with a future callback mechanism for status updates (see #57, ga4gh/task-execution-schemas#121, ga4gh/workflow-execution-service-schemas#133 & ga4gh/cloud-interop-testing#98 (comment)).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions