Study - Simulating Queing Systems

We have seen that practical way to regard distributed systems is as queuing systems with hidden queues.

A practical queuing model, with hidden queues between Client and Server

In this note, we are going to study the behavior by simulation.

Setup

Simulate a Workload against a queuing system with N=10 workers and a constant service time of 10ms. Hence we expect this system to be able to handle 1k requests per second.

We collect the following telemetry on client and server side:

Metric Type	on Client	on Server
Requests	Requests Sent	Requests Arrived (at service worker)
Concurrency	Concurrent Requests	Active Workers
Latency	Response Time	Service Time

We don't capture information about the queue lenght, as most of our queues are hidden in practice.
The service time is constant 10ms, so we expect this graph to be a flat line

Queing System at 50% Load

As long as we have enough free workers, all requests get immediately sergice and the concurrent requests are equal to the active workers.
There are little queing delays the response time is nearly equal to the service time.

Queing System at 90% load

There are on averagte ~12 requests pending, while just under 10 workers are utilized.
We start seeing some queuing. Some requests have to wait double the time before getting serviced.

Queuing Syste at 99% load

At 99% capacity the response times are dominated by queing. In some cases requests were waiting for 250ms before getting serviced (in 10ms).

Queuing at the Edge

Queuing over the Edge