When you implement VoIP (Voice over IP), a whole slew of new protocols show up on your data network. If it’s your job to maintain that data network, you will need to become familiar with these new protocols.

When we talk about VoIP protocols, we can break them down into two broad categories: Call Control  (or Signaling) protocols and Media protocols. Call Control protocols are responsible for the setup and tear-down of each call. These protocols are the ‘traffic cops’ of VoIP. They determine how calls are routed, negotiate various parameters for the call (like what codec to use) and keep track of the state of the call throughout its lifetime. Media protocols, on the other hand, are responsible for transporting the voice. Media protocols are responsible for getting the little blocks of data that represent your speech (or other media like video or fax) to the receiving end.

Get Under Control

Many different Call Control protocols exist. Some are proprietary like Cisco’s MGCP and SCCP (Skinny) protocols. Today, the trend is toward open standards. Most VoIP phone systems will be compatible with these open protocols, even if they implement other, proprietary protocols. Some may be older, more mature standards (like H.323) that have fallen out of favor, but still do a yeoman’s work providing call control on many VoIP systems. Still others like T.38, are very specialized, in this case managing Fax over IP sessions. Let’s take a look at each of these:


Session Initiation Protocol SIP is the protocol used by many VoIP phone systems to set up and tear down calls. SIP is a text-based, human-readable protocol based on a Request-Response model. Today, SIP is the most common Call Control protocol in use for VoIP. SIP basically consists of six Requests (Invite, Ack, Options, Cancel, Bye and Register) and six types of Responses (1xx - Provisional, 2xx - Successful, 3xx - Redirection, 4xx - Client Error, 5xx - Server Error, 6xx - Global Failure).  SIP uses these messages to provide the following functions:

  • User Location: Determines where the end system is that will be used for a call.
  • User Availability: Determination of the willingness (availability) of the called party to engage in a call.
  • User Capabilities: Determination of the media and parameters which will be used for the call.
  • Session Setup: Establishment of the session parameters from both parties (ringing).
  • Session Management: Invoking the services including transfer, termination, and modifying the sessions parameters.

This diagram shows a typical SIP call setup sequence:

SIP calls start with a request, an Invite.


H.323 has been around as a Call Control protocol since 1996. As such, it is sort of the ‘grandfather’ of the genre. It is also a very mature and complete specification. However, it is less flexible and extensible than SIP, and has therefore fallen a bit out of favor. But it is still used in many applications where it is well suited, like video conferencing, and in some more mature VoIP platforms.

Are you old enough to remember the VHS/Beta wars, when video cassette recorders (VCRs) came out? VHS won, but most techies understood that Beta was the superior technology.  The truth is probably that H.323 is a superior protocol for VoIP, but SIP has won the market. Game over. You can argue all you want, but SIP is not going away.


The T.38 Protocol specifically deals with transmission of Fax signals over an IP network. T.38 incorporates things like Forward Error Correction (referred to as ECM or Error Correction Mode), data redundancy, retransmission and many other techniques to compensate for the vagueness of the IP network connection. Please see my blog post  here for more details on T.38.

Liberal Media

Media protocols are responsible for transporting the voice in VoIP from sender to receiver. Media protocols have a completely different set of requirements than Call Control protocols. For example, Call Control protocols must insure that each command/response is received. For this reason, they use TCP rather than UDP, and they typically use a lot of acknowledgements, where the receiver lets the sender know that a packet was received.

Media protocols, on the other hand, deal with real-time data. In this case, there is no time for acknowledgements or re-tries if something is not received. For this reason, Media protocols will use UDP rather than TCP. If a packet is lost, so be it. VoIP systems are designed to accommodate a small amount of packet loss. Typically, systems will interpolate to fill in the gap for a dropped packet.

The other big difference between Signaling Protocols and Media Protocols is the end points. SIP messages are typically (but not always) exchanged between the phone (or other terminal device) and the SIP Proxy (which in many VoIP scenarios will be the IPPBX). Media protocols, on the other hand, typically (but not always) go directly between terminal devices and bypass the SIP Proxy.

Whether we are talking about a SIP, H.323 or T.38 system, all use the RTP (Real-time Transport Protocol) for their Media Protocol.


Real-time Transport Protocol (RTP) is the protocol used by VoIP phone systems to transport voice, video or any other real-time media streams. RTP defines a standardized packet format for delivering audio and video over IP networks. RTP actually refers to a pair of protocols, RTP and RTCP (Real-time Transport Control Protocol). RTCP does not actually transport any media, it helps to set up the RTP conversation between sender and receiver and it monitors the conversation, collecting statistics such as packet counts, latency and jitter. RTCP - Real-time Transport Control Protocol RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP/RTCP services include payload type identification, sequence numbering, time stamping and delivery monitoring.


Secure Real-time Transport Protocol takes the RTP protocol and layers on authentication and encryption.  This provides greater security, including protection against man-in-the-middle attacks and message replay. As with RTP, there is a matching control protocol, SRTCP for Secure Real-time Transport Control Protocol. SRTCP provides all the same services as RTCP, just in a secure manner.

Please also check out my discussion here, where I talked about some basic tasks you can perform to prep your network for this new VoIP traffic. Have you got VoIP protocols running on your network? What impact have you seen? Share your thoughts in the comments.


Pin It on Pinterest

Share This

Share This

Share this with your friends!