azure-app-service-230s-limitation

Handling Long-Running Tasks in Azure App Service

Azure App Service has a hard limit of 230 seconds when processing long-running tasks (source). If a backend HTTP API exceeds this limit, the Azure Load Balancer will drop the connection. By default, the request fails, and it becomes impossible to track the task's success.

Note: Azure Functions with HTTP triggers share this same limitation because they reside behind the same Azure Load Balancer layer.

The Problem

Assume a task (Task A) requires 500 seconds to complete. If a user (User U) calls /api/task/A, the connection will time out before completion. Below are four architectural strategies to resolve this.

Idea 1: Task Splitting (Chain of Responsibility)

Split Task A into smaller subtasks (e.g., A1: 100s, A2: 80s, etc.) and use the Chain of Responsibility pattern to manage execution at the calling layer (User U).

Pros: Minimal changes to core logic; you only need to modularize the task.
Cons: * Complex logic can be difficult to split.
State/output must be stored between subtasks (e.g., in Azure Storage).
Inaccurate estimations can still lead to failure if a subtask accidentally exceeds 230s.

Idea 2: Azure Durable Functions

Migrate the logic to Azure Durable Functions. This framework is designed for stateful, long-running workflows.

Recommended Patterns:

Pattern #1: Function Chaining: Similar to Idea 1, but managed internally by Azure.
Pattern #3: Async HTTP APIs: The API triggers the task and immediately returns a 202 (Accepted) response. The client can then poll a status URL to check progress.
Pros: No need to change the internal logic of Task A; natively handles long execution times.
Cons: Requires migrating from App Service to Functions; requires implementing status-check logic on the client side.

Idea 3: Queue-Based Worker Pattern

Move Task A to a background consumer using a Message Queue (e.g., Azure Event Hubs, RabbitMQ, or Kafka).

Pros: * High observability (track success/failure per event).
Highly scalable; easily handles spikes in volume.

Cons: * Additional costs (Event Hubs).
Increased infrastructure complexity (managing VMs for RabbitMQ/Kafka).

Idea 4: Virtual Machine (VM) Deployment

Move the entire web server and API logic to a dedicated Virtual Machine.

Pros: Bypasses the Azure Load Balancer timeout; no artificial limits on request duration.
Cons: * Higher management overhead (server setup/maintenance).
Manual domain/IP configuration.
Higher costs and no built-in High Availability (requires Scale Sets/Availability Zones).

Other Approaches (With Limitations)

Approach	Limitation
FastAPI `background_task`	The client receives a response, but the server continues. However, there is no native way to monitor if the task eventually fails or succeeds.
Azure WebJobs	Poor support for certain stacks (e.g., Python on Linux) and a smaller community compared to Functions.

Azure App Service 230s Limitation