Azure App Service has a limit when processing long-running tasks: 230 seconds.
This means that if we rely solely on Azure App Service (e.g., a backend HTTP API) and the processing logic exceeds this limit, we cannot track whether the task will succeed. By default, it will fail, and the task will stop.
Handling Long-Running Tasks in Azure App Service
Azure App Service has a hard limit of 230 seconds when processing long-running tasks (source). If a backend HTTP API exceeds this limit, the Azure Load Balancer will drop the connection. By default, the request fails, and it becomes impossible to track the task's success.
Note: Azure Functions with HTTP triggers share this same limitation because they reside behind the same Azure Load Balancer layer.
The Problem
Assume a task (Task A) requires 500 seconds to complete. If a user (User U) calls /api/task/A, the connection will time out before completion. Below are four architectural strategies to resolve this.
Idea 1: Task Splitting (Chain of Responsibility)
Split Task A into smaller subtasks (e.g., A1: 100s, A2: 80s, etc.) and use the Chain of Responsibility pattern to manage execution at the calling layer (User U).
- Pros: Minimal changes to core logic; you only need to modularize the task.
- Cons: * Complex logic can be difficult to split.
- State/output must be stored between subtasks (e.g., in Azure Storage).
- Inaccurate estimations can still lead to failure if a subtask accidentally exceeds 230s.
Idea 2: Azure Durable Functions
Migrate the logic to Azure Durable Functions. This framework is designed for stateful, long-running workflows.
Recommended Patterns:
- Pattern #1: Function Chaining: Similar to Idea 1, but managed internally by Azure.
- Pattern #3: Async HTTP APIs: The API triggers the task and immediately returns a 202 (Accepted) response. The client can then poll a status URL to check progress.
- Pros: No need to change the internal logic of Task A; natively handles long execution times.
- Cons: Requires migrating from App Service to Functions; requires implementing status-check logic on the client side.
Idea 3: Queue-Based Worker Pattern
Move Task A to a background consumer using a Message Queue (e.g., Azure Event Hubs, RabbitMQ, or Kafka).
- Pros: * High observability (track success/failure per event).
- Highly scalable; easily handles spikes in volume.
- Cons: * Additional costs (Event Hubs).
- Increased infrastructure complexity (managing VMs for RabbitMQ/Kafka).
Idea 4: Virtual Machine (VM) Deployment
Move the entire web server and API logic to a dedicated Virtual Machine.
- Pros: Bypasses the Azure Load Balancer timeout; no artificial limits on request duration.
- Cons: * Higher management overhead (server setup/maintenance).
- Manual domain/IP configuration.
- Higher costs and no built-in High Availability (requires Scale Sets/Availability Zones).
Other Approaches (With Limitations)
| Approach |
Limitation |
| FastAPI `background_task` |
The client receives a response, but the server continues. However, there is no native way to monitor if the task eventually fails or succeeds. |
| Azure WebJobs |
Poor support for certain stacks (e.g., Python on Linux) and a smaller community compared to Functions. |