Contents
Why API latency matters
API latency, even a tiny delay, has a big impact on users, businesses, and how well technical systems work.
Impact on user experience
Every millisecond of delay is important. Research shows that once a delay is more than 100 milliseconds (ms), users start to notice it. Around 300 ms, this slowness becomes very noticeable and annoying. For apps that you interact with constantly, like online shopping, search engines, or real-time services, even small delays can make users leave or stop using the app:
– Online shopping: Just a 100 ms slowdown can cause about a 7% drop in sales.
– Crucial systems: Services like video streaming, online gaming, ride-hailing, and stock trading need extremely low latency to work properly and stay competitive.
Business & Financial stakes
Latency isn’t just a technical problem, it directly costs businesses money. For example, Amazon reported that for every extra 100 ms of delay, their profits dropped by about 1%. Google saw 20% fewer users when their page load times increased by 500 ms.
Ultimately, unhappy customers, less engagement, and failing to meet service agreements (SLAs) can directly hurt a company’s earnings.
Technical & system implications
APIs are often the core of complex systems where different parts work together. Latency adds up as a request travels through many layers, such as different small services, API gateways, security checks, and proxies. This means a small latency issue can become a big problem across the whole system.
High latency also uses more computing power because it keeps resources (like processing threads and database connections) busy for longer. This reduces how much work your system can do overall and increases the cost of running your infrastructure.
Components & Root causes of latency
Latency (delay) in an API request comes from several places. Understanding each of these helps figure out where to focus on making things faster.
Network latency
This is the time it takes for data to travel over the internet itself.
– Propagation delay: The time it takes for data to physically travel the distance between your computer and the server. Even at the speed of light, there’s a delay over long distances.
– Transmission delay: The time needed to actually put all the bits of information onto the network cable or wireless connection.
– Processing & queuing delays: When data packets hit routers and switches on the internet, these devices need to inspect and forward them. If there’s too much traffic, packets can get stuck waiting in “queues” (like a traffic jam), adding to the delay.
– Setup times: Before any actual data can be sent, there are initial handshakes and security setups. This includes:
– DNS resolution: Looking up the server’s IP address from its human-readable name.
– TCP handshake: The initial three-way conversation to establish a connection.
– TLS negotiation: The process of setting up the secure encrypted connection (for HTTPS). Each of these steps can add tens to hundreds of milliseconds.
Queuing delays
These delays happen when requests have to wait in line.
– When the system handling API requests (the “frontend”) is overloaded, or when requests pass through proxies or gateways, they get put into queues.
– A high amount of traffic or work on the system makes these queues longer, increasing delays. This waiting can happen at various points: in network devices, on the application servers themselves, or within middleware software.
Server-side processing
This is the time the server takes to actually do the work you asked for.
– Execution time: How long it takes for the server’s code to run the logic, call other services if needed, and prepare the response data.
– Database operations: If the API needs to get data from a database, actions like searching through unindexed data, waiting for database “locks” to be released, or slow hard drives can add significant delays.
– Inefficient code: Poorly written algorithms, slow ways of converting data (serialisation), or waiting on other slow external services can all make server processing take longer.
API gateway & middleware
Many APIs have extra layers between your app and the main server.
– Layers like API gateways, systems that check who you are (authentication), systems that check what you’re allowed to do (authorisation), routing services, data transformations, and logging all examine and change requests. Each of these steps takes some time.
– While each extra service or layer only adds a small amount of latency, these small amounts can add up to a significant total delay across the entire request path.
Client-side delays
Even on the user’s side, there can be delays.
– Network conditions: The quality of the user’s internet connection (e.g., slow mobile data or crowded public Wi-Fi) can cause unpredictable delays.
– Client processing: Once your app receives the data, it still needs to process it. This includes things like converting JSON data into something your app can use, rendering the user interface, or bundling multiple tasks, all of which can delay when the response is fully displayed to the user.
Measuring & monitoring latency
Accurately measuring and constantly monitoring API latency is crucial for finding performance issues and keeping users happy.
Key metrics & tools
Time to first byte (TTFB)
This measures the time from when your request is sent until you receive the very first piece of the server’s response. It tells you about network delays and how fast the server initially starts processing. An ideal TTFB is typically less than 200 milliseconds.
Total response time
This is the full time from when you send a request until you get the complete response back. Breaking this down into “queue time” (how long it waited) versus “processing time” (how long the server worked on it) helps pinpoint exactly where the slowdown is
– Percentile metrics (P50, P90, P95, P99): These are much better than simple averages.
– P50 (50th percentile) is the median time, meaning 50% of requests are faster than this.
– P95 (95th percentile) and P99 (99th percentile) are especially important. They show you “tail latency,” which highlights the performance for the slowest 5% or 1% of your users. This uncovers issues that affect only a small but significant group of users.
Monitoring approaches
Synthetic monitoring
This involves setting up automated “robots” that pretend to be users. They run specific actions (like logging in or fetching data) against your API at regular times from different locations around the world. This helps you track TTFB and response times from an external perspective. Tools for this include New Relic Synthetics, Datadog, k6, Locust, and JMeter.
Real-time observability & metrics
This involves putting special code (agents or “instrumentation”) directly into your application using tools like OpenTelemetry, Prometheus, or DataDog. These tools collect timing data for every single request as it happens. This allows you to calculate percentile metrics and view live dashboards in tools like Grafana, New Relic, or AppDynamics.
Distributed tracing
For complex systems with many interconnected services (microservices), distributed tracing lets you see the detailed journey of a single request. It captures how much time is spent at each step, from one microservice to another, to database calls, and external integrations, helping you pinpoint exactly where delays occur.
Establishing baselines & alerts
– Set meaningful thresholds (SLA/SLO): Define clear goals for your API’s performance. For example, for real-time APIs, you might aim for P95 < 100 ms and P99 < 300 ms. For less time-sensitive APIs, higher values might be acceptable.
– Alert on percentile breaches: Set up rules to automatically send alerts when your latency metrics (especially percentiles) go above your defined thresholds. You can do this with tools like Prometheus or Datadog/New Relic.
– Optimise alert responsiveness: Use “rolling windows” (e.g., alert if P99 is above the threshold for 5 consecutive minutes) to avoid getting too many alerts from minor, temporary spikes.
Load testing & CI integration
– Pre-production benchmarking: Before you launch your API, use tools like k6, Gatling, or Locust to simulate a lot of user traffic. This helps you record how latency behaves under stress and ensure your percentile goals are met before your API goes live.
– CI/CD pre-merge checks: Integrate latency tests directly into your development pipeline (CI/CD). This means that before any new code is added to the main project, automated checks will run. If a change causes performance to get worse (increases latency), it can be stopped from reaching your production environment.
Sportmonks API latency
Sportmonks, a major provider of sports data, takes API latency very seriously. This is because our clients need real-time performance for things like live scores, player stats, and betting odds.
Request filtering to reduce data and latency
Our documentation shows “Request Options” that allow developers to select only the specific data fields they need and ignore everything else. This helps make the data packets smaller and speeds up how fast information travels over the internet.
Premium odds feed for speed
For applications where speed is absolutely critical (like real-time betting), Sportmonks partners with TXODDS to provide a “Premium Odds Feed.” This feed delivers updates with extremely low latency, which is essential when even tiny delays can change betting outcomes.
Rate limits for stable performance
Our system limits how many requests you can make (3,000 calls per endpoint per hour). This protects our infrastructure when there’s a lot of activity, preventing slowdowns and ensuring our services remain fast and reliable. Clients receive information in our responses that helps them manage their requests and slow down gracefully if they are approaching or hitting these limits.
Get faster results with Sportmonks
At Sportmonks, we keep things fast. Use include, select, and filters to reduce payload size, and rely on our Premium Odds Feed for lightning-fast updates when milliseconds count. Our clear rate limits and optimised endpoints ensure smooth, scalable performance.
Power your app with speed and explore Sportmonks’ real-time data today.
Faqs about latency
- Queue time (waiting before processing),
- Service/processing time,
- Network delay (round‑trip).



