I was interested in how Apple AirPlay works in my network. I am using an iPad to stream music to a Yamaha R-N500 network receiver. There is a great Unofficial AirPlay Protocol Specification which already shows many details about the used protocols. But since I am a networking guy I captured the whole process in order to analyze it with Wireshark.
At the time of capturing I was using an iPad Air 2 with iOS 11 called Johannes-ei-Patt.local. The Yamaha R-N500 network receiver is listed as Wohnzimmer.local and was running with firmware version 1.13. I captured at the receiver with the ProfiShark 1G network TAP from Profitap.
My home network consists of an AVM FRITZ!Box router fritz.box (hence there are some packets in the trace) as well as another AirPlay server on a Raspberry Pi jw-pi01.local (hence there are some more multicast DNS packets rather than from the Yamaha receiver itself). There were some more packets in the trace such as ARP requests from other stations which I filtered out in order to have an almost clean tracefile to analyse.
During the capturing I did the following steps:
- at 19:34:59 UTC I opened the iPad
- at 19:35:44 UTC I selected “Wohnzimmer” as the AirPlay destination (which was already selected from a previous session)
- at 19:36:12 UTC I started to stream the music for 20 seconds (but the UDP stream ran a few seconds longer)
- at 19:37:05 UTC I closed the iPad
This is the tracefile packed within a zip:
Again, please consult this “Unofficial AirPlay Protocol Specification” for many details about how Apple detects AirPlay devices (service discovery) and how audio/video streams work. The interesting parts are the usage of multicast DNS on port 5353 for service discovery, TCP on port 1030 for announcing the RTSP capabilities, and UDP on port 1303 for the mere audio stream.
Using NetworkMiner you can get an overview about the hosts within the stream and their incoming/outgoing sessions as well as a glance to the DNS queries/answers. Note that the Yamaha receiver is detected as “Cisco” which is definitely not the case: ;)
Using Wireshark you can first analyze the conversations for Ethernet, IPv4, TCP, and UDP. In any case you can see the connections between the iPad and the receiver along with much multicast traffic:
You should have a look at the multicast DNS (mDNS, UDP source and destination port 5353) packets that list the queries from the iPad and the answer from the receiver (advertising a Remote Audio Output Protocol, RAOP streaming):
Then (the interesting part) you can have a look at tcp.stream eq 13 – TCP destination port 1030 which shows the RTSP announce among other things while the udp.stream eq 13 – UDP destination port 1303 has the audio stream which does not make much sense to look at it with ASCII ;)
Note that I tried to decode this stream as RTP (right mouse click -> Decode As -> Current -> RTP) in order to analyze it via Telephony -> RTP -> RTP Streams, but this did not work due to “codec is unsupported”. Probably since the RAOP protocol uses AES encryption for the data channel (ref: Wikipedia).
Finally you can have a look at the IO graph that shows the packets per second. You can clearly see the 20 seconds of audio streaming:
In another trace I was interested in the bandwidth/traffic for a certain amount of time. For this I streamed one single song (Skillet – Feel Invincible, you can also see this title in the pcap trace) to the AirPlay receiver. While the m4a file on my iPad has a size of 7.28 MiB, variable bit rate, overall bit rate 265 kb/s, and a length of 3:49 min, the complete UDP stream had about 32 MB. This is a bit rate of about 1.2 Mbps.