Latency
Contents

Why API latency matters

API latency, even a tiny delay, has a big impact on users, businesses, and how well technical systems work.

Impact on user experience

Every millisecond of delay is important. Research shows that once a delay is more than 100 milliseconds (ms), users start to notice it. Around 300 ms, this slowness becomes very noticeable and annoying. For apps that you interact with constantly, like online shopping, search engines, or real-time services, even small delays can make users leave or stop using the app:

Online shopping: Just a 100 ms slowdown can cause about a 7% drop in sales.
Crucial systems: Services like video streaming, online gaming, ride-hailing, and stock trading need extremely low latency to work properly and stay competitive.

Business & Financial stakes

Latency isn’t just a technical problem, it directly costs businesses money. For example, Amazon reported that for every extra 100 ms of delay, their profits dropped by about 1%. Google saw 20% fewer users when their page load times increased by 500 ms.

Ultimately, unhappy customers, less engagement, and failing to meet service agreements (SLAs) can directly hurt a company’s earnings.

Technical & system implications

APIs are often the core of complex systems where different parts work together. Latency adds up as a request travels through many layers, such as different small services, API gateways, security checks, and proxies. This means a small latency issue can become a big problem across the whole system.

High latency also uses more computing power because it keeps resources (like processing threads and database connections) busy for longer. This reduces how much work your system can do overall and increases the cost of running your infrastructure.

Components & Root causes of latency

Latency (delay) in an API request comes from several places. Understanding each of these helps figure out where to focus on making things faster.

Network latency

This is the time it takes for data to travel over the internet itself.

Propagation delay: The time it takes for data to physically travel the distance between your computer and the server. Even at the speed of light, there’s a delay over long distances.
Transmission delay: The time needed to actually put all the bits of information onto the network cable or wireless connection.
Processing & queuing delays: When data packets hit routers and switches on the internet, these devices need to inspect and forward them. If there’s too much traffic, packets can get stuck waiting in “queues” (like a traffic jam), adding to the delay.
Setup times: Before any actual data can be sent, there are initial handshakes and security setups. This includes:
      – DNS resolution: Looking up the server’s IP address from its human-readable name.
      – TCP handshake: The initial three-way conversation to establish a connection.
      – TLS negotiation: The process of setting up the secure encrypted connection (for HTTPS). Each of these steps can add tens to hundreds of milliseconds.

Queuing delays

These delays happen when requests have to wait in line.

– When the system handling API requests (the “frontend”) is overloaded, or when requests pass through proxies or gateways, they get put into queues.
– A high amount of traffic or work on the system makes these queues longer, increasing delays. This waiting can happen at various points: in network devices, on the application servers themselves, or within middleware software.

Server-side processing

This is the time the server takes to actually do the work you asked for.

Execution time: How long it takes for the server’s code to run the logic, call other services if needed, and prepare the response data.
Database operations: If the API needs to get data from a database, actions like searching through unindexed data, waiting for database “locks” to be released, or slow hard drives can add significant delays.
Inefficient code: Poorly written algorithms, slow ways of converting data (serialisation), or waiting on other slow external services can all make server processing take longer.

API gateway & middleware

Many APIs have extra layers between your app and the main server.

– Layers like API gateways, systems that check who you are (authentication), systems that check what you’re allowed to do (authorisation), routing services, data transformations, and logging all examine and change requests. Each of these steps takes some time.
– While each extra service or layer only adds a small amount of latency, these small amounts can add up to a significant total delay across the entire request path.

Client-side delays

Even on the user’s side, there can be delays.

Network conditions: The quality of the user’s internet connection (e.g., slow mobile data or crowded public Wi-Fi) can cause unpredictable delays.
Client processing: Once your app receives the data, it still needs to process it. This includes things like converting JSON data into something your app can use, rendering the user interface, or bundling multiple tasks, all of which can delay when the response is fully displayed to the user.

Measuring & monitoring latency

Accurately measuring and constantly monitoring API latency is crucial for finding performance issues and keeping users happy.

Key metrics & tools

Time to first byte (TTFB)
This measures the time from when your request is sent until you receive the very first piece of the server’s response. It tells you about network delays and how fast the server initially starts processing. An ideal TTFB is typically less than 200 milliseconds.

Total response time
This is the full time from when you send a request until you get the complete response back. Breaking this down into “queue time” (how long it waited) versus “processing time” (how long the server worked on it) helps pinpoint exactly where the slowdown is
– Percentile metrics (P50, P90, P95, P99): These are much better than simple averages.
– P50 (50th percentile) is the median time, meaning 50% of requests are faster than this.
– P95 (95th percentile) and P99 (99th percentile) are especially important. They show you “tail latency,” which highlights the performance for the slowest 5% or 1% of your users. This uncovers issues that affect only a small but significant group of users.

Monitoring approaches

Synthetic monitoring
This involves setting up automated “robots” that pretend to be users. They run specific actions (like logging in or fetching data) against your API at regular times from different locations around the world. This helps you track TTFB and response times from an external perspective. Tools for this include New Relic Synthetics, Datadog, k6, Locust, and JMeter.

Real-time observability & metrics
This involves putting special code (agents or “instrumentation”) directly into your application using tools like OpenTelemetry, Prometheus, or DataDog. These tools collect timing data for every single request as it happens. This allows you to calculate percentile metrics and view live dashboards in tools like Grafana, New Relic, or AppDynamics.

Distributed tracing
For complex systems with many interconnected services (microservices), distributed tracing lets you see the detailed journey of a single request. It captures how much time is spent at each step, from one microservice to another, to database calls, and external integrations, helping you pinpoint exactly where delays occur.

Establishing baselines & alerts

Set meaningful thresholds (SLA/SLO): Define clear goals for your API’s performance. For example, for real-time APIs, you might aim for P95 < 100 ms and P99 < 300 ms. For less time-sensitive APIs, higher values might be acceptable.
Alert on percentile breaches: Set up rules to automatically send alerts when your latency metrics (especially percentiles) go above your defined thresholds. You can do this with tools like Prometheus or Datadog/New Relic.
Optimise alert responsiveness: Use “rolling windows” (e.g., alert if P99 is above the threshold for 5 consecutive minutes) to avoid getting too many alerts from minor, temporary spikes.

Load testing & CI integration

Pre-production benchmarking: Before you launch your API, use tools like k6, Gatling, or Locust to simulate a lot of user traffic. This helps you record how latency behaves under stress and ensure your percentile goals are met before your API goes live.
CI/CD pre-merge checks: Integrate latency tests directly into your development pipeline (CI/CD). This means that before any new code is added to the main project, automated checks will run. If a change causes performance to get worse (increases latency), it can be stopped from reaching your production environment.

Sportmonks API latency

Sportmonks, a major provider of sports data, takes API latency very seriously. This is because our clients need real-time performance for things like live scores, player stats, and betting odds.

Request filtering to reduce data and latency

Our documentation shows “Request Options” that allow developers to select only the specific data fields they need and ignore everything else. This helps make the data packets smaller and speeds up how fast information travels over the internet.

Premium odds feed for speed

For applications where speed is absolutely critical (like real-time betting), Sportmonks partners with TXODDS to provide a “Premium Odds Feed.” This feed delivers updates with extremely low latency, which is essential when even tiny delays can change betting outcomes.

Rate limits for stable performance

Our system limits how many requests you can make (3,000 calls per endpoint per hour). This protects our infrastructure when there’s a lot of activity, preventing slowdowns and ensuring our services remain fast and reliable. Clients receive information in our responses that helps them manage their requests and slow down gracefully if they are approaching or hitting these limits.

Get faster results with Sportmonks

At Sportmonks, we keep things fast. Use include, select, and filters to reduce payload size, and rely on our Premium Odds Feed for lightning-fast updates when milliseconds count. Our clear rate limits and optimised endpoints ensure smooth, scalable performance.

Power your app with speed and explore Sportmonks’ real-time data today.

Faqs about latency

How to fix API latency?
- Optimise network: Use CDNs, persistent TCP/TLS connections, and active queue management to reduce travel and handshake delays. - Trim payloads: Request only needed fields, compress responses, and use efficient formats like JSON/Protobuf. - Optimise servers: Cache repeatedly-used data, tune database queries, index tables, use connection pools, and employ asynchronous handling. - Streamline gateways: Enable HTTP/2 or HTTP/3, reduce middleware layers, and route authentication or transformations closer to the edge.
What is a good latency for API calls?
Targets vary, but in general: - < 100 ms is perceived as instant. - 100–300 ms is still smooth. -  Above 300 ms, users begin to notice lag; 800 ms+ can feel slow. For real-time systems, aim for sub-10 ms or even microseconds
What is latency in HTTP?
Latency in HTTP refers to the time taken for a request to travel from client to server and for the first byte of the response to return, this includes DNS, TCP handshakes, TLS setup, and network delay.
What is the delay time for an API?
The delay time (total API response time) is the sum of:
  1. Queue time (waiting before processing),
  2. Service/processing time,
  3. Network delay (round‑trip).
Latency measures until the first byte arrives, while full delay continues until the last byte is received. Ideal TTFB is < 200 ms, depending on use case.

Written by David Jaja

David Jaja is a technical content manager at Sportmonks, where he makes complex football data easier to understand for developers and businesses. With a background in frontend development and technical writing, he helps bridge the gap between technology and sports data. Through clear, insightful content, he ensures Sportmonks' APIs are accessible and easy to use, empowering developers to build standout football applications