Updated August 2021. Content subject to change. Copyright (c) 2021 Sienda Multimedia Ltd. All rights reserved.
Latency in an AVB system (or any other ‘over ethernet’ system) can be measured in a variety of ways. It is not always straightforward to compare quoted latencies between different systems as the values are not always quoted using the same measurement method, and sometimes it’s not even stated exactly what latency has been measured/quoted.
In general there are three different types of latency that are useful to understand in an AVB (or AoIP) system:
These are depicted in the following illustration:
We will start with a discussion of network latency, as this latency varies depending on the size of the network (number of hops).
There are two different values that fall under the ‘network latency’ title, and these are:
Max transit time is the maximum time it can take a stream packet to get across the network, from the Talker to the Listener. This is the actual worst-case latency of the network between a given Talker and Listener device for a given stream. Please note that this value can be different for different streams, as it depends partly on the amount of audio data in each packet. The AVB standards guarantee that the network can deliver all the stream packets across the network within this max transit time.
The presentation time offset is a configurable latency value assigned to each talker stream, which determines when the audio in the stream will be presented in the Listener device. The presentation time offset defaults to 2ms, but may be modified (per stream) in an AVB controller. Listener devices will buffer incoming audio until the presentation time and then present it to the i2s/DAC etc. This is the mechanism that allows all devices on the network to reproduce the audio at exactly the same/correct time.
The max transit time of an AVB network is usually around 140uS per hop for a gigabit link, or around 280uS per hop for a 100mbit link, but the exact value can be obtained from the network via an AVB controller application, such as Hive:
Here we can see that for a connected stream the reported max transit time (called accumulated latency in hive) is 517286 nanoseconds. Please note that for Milan devices max transit time and accumulated latency are the same thing. AVB networks guarantee delivery of stream packets within the latency promises, so you can safely set the presentation time offset of the talker stream to 518uS (0.518ms) and the stream is guaranteed to be delivered on time with no dropped packets.
When comparing latency values of AVB networks with ‘over-IP’ standards, please bear in mind that the AVB accumulated latency value is the minimum network latency for guaranteed delivery. You can still set the latency lower, but then the network cannot guarantee delivery on time. Other ‘over-IP’ standards cannot guarantee delivery at all, so they will often quote latency figures that work ‘most of the time’, but if you want to be sure you will not drop the occassional frame, then use an AVB network with an appropriately set presentation time offset.
The default presentation time offset (latency) for an AVB talker stream is 2ms. Using the per-hop latency values from above, we can see that the default 2ms latency is good for 14 gigabit hops, or 7x 100mbit hops:
gigabit | 100mbit | |
---|---|---|
latency per hop | 140uS | 280uS |
number of hops in 2ms | 14 | 7 |
Of course, much ‘wider’ networks can be created, but the presentation time of streams traversing more than 14 hops should be set greater than 2ms.
AVB networks not only guarantee the stated delivery at the stated latencies, but do so with zero-configuration, thanks to the Multiple Stream Reservation Protocol (MSRP), and the Credit Based Shaper (CBS). ‘Over-IP’ protocols are not able to compete with this, and even a legacy network manually configured with QoS cannot guarantee that all packets will arrive, because bursts of stream traffic can accumulate in switch ports causing dropped packets. AVB’s Credit Based Shaper ensures that stream packets are evenly spaced on the wire to prevent such bunching of packets and to reduce the buffering requirement of switches. In the following illustration we have a non-AVB switch, receiving 8 streams of audio from non-AVB devices on two ports, and sending the 16 streams out a third port. You can see that in this simple example of 2 ingress ports the switch is required to have a buffer capable of buffering 8 packets.
Compare this with an AVB switch receiving the same number of streams on the same number of ports from AVB devices, and you can see that the buffering required is only 1 packet.
Thus, 3 port non-AVB switches require approximately 8 times the memory (for packet buffering) than AVB switches, just to ensure they wont drop stream traffic. If we extend the example to the case of a 24 or 48 port switch, or a large network where streams are accumulating at multiple switches, then we have a requirement for non-AVB switches to have very large memory buffering requirements. Many non-AVB switches simply don’t have enough memory to guarantee delivery of all stream packets.
Avnu certified AVB switches are tested to ensure they have sufficient buffering to meet the promises of AVB.
AVB endpoint devices (Talkers and Listeners) have their own device latency.
Talker devices (devices that send AVB streams), have an input latency which is the time it takes a sample to get from the audio input (be that a microphone diaphram, an analogue XLR input, AES/EBU socket, i2s interface or otherwise), to be packetised into a network packet, and be delivered to the ‘network’ for delivery.
Listener devices (devices that receive AVB streams), have an output latency which is the offset between the presentation time of an audio packet and the time the first sample in the audio packet actually appears at the the audio output (be that a loudspeaker, AES/EBU socket, i2s interface or otherwise).
Device latencies are device specific, but for a generic microcontroller based device the latency may be calculated by summing the latencies of the various processing stages in the device:
A multicore MCU (or CPU) looking to maximise throughput (number of audio channels and streams) may trade an increase in 125uS latency in exchange for an extra parallel pipeline stage:
An MCU with an advanced DMA engine, or an FPGA based solution may achieve lower device latency by having less pipeline stages:
Likewise, the Listener device will have device output latency. For example:
The total latency of a particular path from the audio input of a Talker device, through the network, and out the audio output of a paricular Listener device, is simply the sum of the Talker input latency, the presentation time offset (network latency), and the Listener output latency. Considering the example of a path from AVB microphone to AVB loudspeaker, it is the latency between an analogue signal entering the microphone and the same analogue signal being reproduced by the loudspeaker. It is a useful metric as it is representative of the latency that we as humans will experience, and will often be the deciding factor in whether an AVB system is responsive enough for (for example), live performance.
Total latency is also useful for developers of AVB systems as it is easily measureable and can be used to verify that AVB devices have been implemented correctly and are reporting their latencies correctly.
In a real example, we have an AVB Talker device with 375uS input latency, connected through a network reporting a max transit time of 704006ns (704uS), with a presentation time offset set to 1ms (1000uS), and a Listener device with 250uS output latency. We would expect the total latency of this chain to be:
total latency = Talker input latency + presentation time offset + Listener output latency
= 375 + 1000 + 250 uS
= 1625 uS
= 1.625 ms
The Talker and Listener devices both expose an i2s interface and so the device latencies are between the i2s interface and the network. This makes it really easy to measure the total latency as we can we can scope the i2s data pins (Talker input and Listener output), and apply a digital signal going from 0 (0x0) to -1 (0xFFFFFFFF) in to the Talker input.
The resulting capture looks like:
We can see that the measured total latency is 1.625 mS. This confirms that the device latencies are spot on.
In another example we have an AVB Talker device with 680uS input latency, connected through a network reporting a max transit time of 517286ns (517uS), with a presentation time offset set to 1ms (1000uS), and a Listener device with 750uS output latency. We would expect the total latency of this chain to be:
total latency = Talker input latency + presentation time offset + Listener output latency
= 680 + 1000 + 750
= 2.43ms
If we connect a scope to the input and output of the chain (both analogue) and apply a click track, we can measure the total latency:
The measured total latency of 2.43ms matches our expectiations (within the measurement accuracy of the scope).
For more information please contact:
Kieran Tyrrell
www.sienda.com
first name @sienda.com
+44 1392580556
Copyright 2021 Sienda Multimedia Ltd. All rights reserved.