Motorola - The Packet Voice Revolution - Gabriel Dusil • Generation Crypto

Introduction

For the first time in over 100 years, the telecom industry has entered a paradigm shift in voice communications. The evolution of the 90’s has taken us from the traditional circuit-based switching communication to a new and improved form of voice communication based on packets. Ever since the formation of the Internet over 25 years ago, data networks have revolutionized the communication industry, and have finally brought all the benefits of its technology to the voice market. Has the development of the Public Switched Telephone Network (PSTN) reached the end of the line? This seems to be the case as datacom equipment brings more and more value added features that are not found in today’s telecom industry. For the first time in history, the telecommunications industry is opening its eyes to a new era of packet voice. Data has traditionally played a relatively small roll in the overall communications industry, representing approximately 150 Billion US$ revenue per year, versus a mammoth telecommunications industry well over a trillion US$. Regardless of absolute market size, there are well found reasons for the telecom’s interest in data networking; According to Dataquest, the telecom market has grown a mere 8 percent over the last two years, while data communications has grown an impressive 23%. Today’s consumers are beginning to realize the financial and technical benefits of packet voice[1]. As with any paradigm shift, there is consumer speculation, in the stability and viability of new technology. Providing a comparative analysis between the current packet voice solutions yields to a better understanding of existing solutions such as Voice over Frame Relay (VoFR) and Voice over IP (VoIP). An early understanding of the technology behind packet voice as provided in this article, provides a big picture view in order to assess future requirements for integrating multi-media into a single managed network. The advent of packet voice is expected to show a sizeable dent in world-wide voice revenues. According to the International Data Corporation, as many as 16 million Internet Telephony users are expected by the year 1999. By the year 2002 packet voice revenues are expected to equal that of PSTN voice. These startling predictions have brought the attention of both datacom and telecom players. The next battle is in capturing a significant portion of this rapidly developing market.

The Data Evolution

Lets face it, the Public Switched Telephone Network has reached the end of its rope. Circuit switching can no longer keep up with market demands, in the same manner as packet switching networks. Namely the effective expansion of new services, value added features, and cost effective growth. Although the past decade has shown evidence of enhancements to the PSTN with Computer Telephony Integration (CTI), the end result has shown an expensive effort to integrate existing Information Technology (IT) into a circuit switching networks. Why not try it the other way around; Rather than integrating computers into the existing telephone network, consider integrating telephones into the existing data network. VoIP and VoFR market have taken this approach. Packet voice takes communications to a fresh beginning. Bringing with it is all the existing services of the PSTN we have grown to love, such as call waiting, conference calls, hunt groups, etc. Packet voice goes a step beyond, by allowing the ability to:

Save up to ten times the existing bandwidth by using efficient data compression algorithms.
Support future integration of interactive multi-media features such as video
Management of one network for all communication traffic: Voice, data, and video.
Use existing, well established communication protocols such as Frame Relay or IP, and supporting heterogeneous architectures.
Provide flexibility for end users in controlling the corporate telephone bill, allowing long distance saving right up to the desktop. For example, end users will have the flexibility to dial 6 (and then the phone number) for communication over Frame relay, 7 for IP, and 8 for the PSTN.

Packet voice communications is proving to be the first stepping-stone in a future overflowing with real-time multi-media.

Switching Circuits or Packets

Packet switching is proving to be a winning strategy for telecom providers. So much in fact, that telecom carriers are beginning to re-position themselves in efforts to maximize their revenue potential. Packet switching has grown to be more than just the explosive expansion of subscribers, but in the cost savings to both the consumer and the provider. Let’s take a step back and look at today’s communication model; the modern telephone network is based on a tried and true circuit switching technology; whereby one dedicated physical connection is required for each telephone call. Thus, one channel represents one physical connection as shown in figure #1. In this case an analog voice signal of 3.1 kHz is sent to a local exchange where it is time division multiplexed (TDM) with other voice channels connected to the same exchange. When the signal reaches its destination it is demultiplexed and continues on to the receiver. Unfortunately the analog nature of the PSTN is where the problem lies. Development of analog equipment has traditionally proven to be a difficult task, as confirmed by the complexity of the V.34 modem specification. The Future Is Digital! We have seen evidence of digital technology in almost every aspect of our life; From mainframes to personal computers, from CD ROM’s to Digital Video Disk (DVD) and Dolby Digital, to name only a few. Believe it or not, even our refrigerators and cars are digital! This evolution has not happened by chance, but rather by necessity. Product development is down to a science, and has proven to be a more flexible means of producing faster and more efficient appliances. All new communication protocols such as Frame Relay, Digital Subscriber Line (DSL), Asynchronous Transfer Mode (ATM), are all based on this digital foundation. The transition from the analog to the digital world has been an important step in the migrating to the modern digital world. The V.90 modem specification itself is evidence of the bridge between the analog PSTN and digital ISDN domain. The next step in digital telephony is packet voice. In this scenario, usage of a connection is based on information units, which are independent of the physical media. Information units may be packets, frames, or cells, depending on which protocol that is used, but in all cases, information units are communicated over a shared network as shown in figure #2. Furthermore, these information units are sent over separate virtual channels, which is independent of the media. Each packet is identified by a header which may include information as to the channel it is using, where it came from (i.e. the source or transmitter), and where it is going (i.e. destination, or receiver). Providers in this packet market are able to capitalize on the underlying benefits of a shared communication. Namely, the ability to sell more bandwidth than what is actually available. The reason is due to the statistical analysis of network usage. Since subscribers are not expected to use all of their purchased bandwidth 24 hours per day, 7 days a week, then the end result is the ability to sell to more subscribers, without expanding the backbone infrastructure, thereby increased their revenue potential. In other words, 500 subscribers may have purchased 64 Kbps of bandwidth, but only use an average of 25% of the bandwidth. Therefore, the provider could theoretically sell to four times the number of users (2000 subscribers) without adversely affecting network performance. This results in a Win-Win scenario between the customer and the supplier, whereby the provider increases their revenue potential while passing the cost savings down to the customer. This winning solution has proven successful in data communications, and has subsequently opened the eyes of the telecom players.

How Do I Sign Up?

The Internet is ubiquitous. Subscribing to the Internet is an easy task, whether it be via a local Internet Service Provider (ISP), Competitive Local Exchange Carrier (CLEC), or a Public Operator. But packet voice communications has brought about a misconception that VoIP actually means Voice over the Internet.

Figure #1. Analog Voice through a Circuit Switched Network, PSTN

Figure #2. Packet Voice through a packet switched network, IP or Frame Relay

The behavior of the various types of multi-media traffic can be viewed graphically in relation to bandwidth requirements, tolerance of packet loss, and tolerance of delay, as shown in figure #3. Based on this analysis, we can conclude that voice delay must be kept to a minimum in order to maintain an acceptable level of quality. At the same time, in today’s private networks, the added bandwidth needed to handle enterprise-wide voice is radically lower, compared to the fat communication pipes needed for video, graphics, and data traffic from applications such as Web browsing, or video conferencing.The ease of sending data over the Internet unfortunately can not be applied to real time communications such as voice or video. Frankly, the Internet is not ready for the aggressive demands of handling real-time data. As a matter of fact, the Internet may never be ready for Voice and Video. Analysis predict that it will take at least three to five years before the Internet is ready to handle real-time quality of service, and in the management of delay sensitive voice traffic. Consider the massive growth in data traffic alone, which is expected grow from 10 million users today to 60 million by the year 2000, according to Forrester Research, Inc. The bandwidth requirements for data alone are challenging. In short, the immediate future for packet voice is in dedicated networks, such as services offered by Internet Telephony Service Providers (ITSP) or CLEC’s. When dealing with multi-media traffic and quality of service, where the public Internet disappointingly fails, the private enterprise shines with top marks. The seamless ability of implementing packet voice into existing corporate networks is quickly proving to be successful. Proving once again that enhancements to the corporate network provide added ammunition in fighting the competition. The choices at the present time comes down to two transport protocols; Frame Relay, and the Internet Protocol. In many cases availability alone, may be the only deciding factor when implementing packet voice. From a multi-national perspective, Frame Relay has yet to achieve ubiquitous availability. In such cases, IP will be the only alternative. Nevertheless, from a quality and service standpoint, as explained in the remainder of this article, the best solution may be to install Frame Relay where available, then use IP in the remaining locations. A third alternative is to utilize a mixed environment and deploying gateway devices which joining both networks together. Packet Voice Gateways are responsible for insuring end to end communications of voice traffic, manage bandwidth, and maintain a high level of quality of service. Frame Relay has had wide acceptance as a reliable and cost effective network protocol. Most of the developing countries in Central and Eastern Europe have already rolled out their services. In countries where Frame Relay is limited or unavailable, corporations have implemented this protocol in private networks in order to take advantage of its Statistical Time Division Multiplexing (STDM) capabilities, and relative ease of use, and low latencies. Global carrier alliances have expanded their availability of Frame Relay, and deregulation is also expected to strengthen its success. In addition, new services such as Switched Virtual Circuits (SVC) are expected to reinforce the Frame Relay market by bringing the benefits of switched services to an already strong portfolio. From the standpoint of the early adopters, the following market segments will find immediate benefits of Packet Voice technology:

Enterprise customers currently utilizing Frame Relay services – In this case, VoFR because a significant value-add and cost saver to the corporation. Customers have the option to chose either VoFR or VoIP at the access point, and then use Frame Relay as the transport protocol, taking advantage of it’s low latency and high efficiency.
Enterprise customers currently utilizing IP – In these networks, VoIP is an add-on to the existing infrastructure, without interrupting the integrity of existing applications. In most cases, bandwidth is not an issue. But prerequisites for VoIP are required, such as IP based Quality of Services (QoS); namely the implementation of Real Time Protocol (RTP) and Type of Service (ToS). In addition, sophisticated prioritization and queuing schemes will be required in order to ensure that voice traffic is given preference over data.
Internet Telephony Service Providers (ITSP) – These customers will design packet voice networks from the ground-up. Key considerations is in the use of leading-edge gateway functions which will seamlessly bind IP networks with Frame Relay networks, and also ensuring compatibility with various types of telephone exchanges.

Figure #3. Behavior & Requirements of Multi-media traffic

Free Bandwidth

The financial benefit alone have been the main driving force behind the packet voice hype. Until new providers begin to publicize their tariffs, the cost benefits of this technology will remain speculative. At this point a simple analogy can be used to compare the cost of today’s circuit switched voice versus using the same connection with compressed voice, as illustrated in table #1. In private networks, the savings of compressed voice can be mapped directly to the phone bill:

What had initially cost 800 US$ per month now costs 70 US$, using G.723.1 (i.e. 5.3 Kbps voice compression).

Table #1. Bandwidth Requirements in Telecom Networks verses Datacom Networks

Even when accounting for Frame Relay’s overhead, the phone bill increases by a mere 8 US$ per month. Unfortunately the real world cost savings are not so clear cut. Despite countless articles announcing Voice over IP’s expected savings of up to 90%, the true figures will only be evident once this market has matured. Initially, CLEC’s and ITSP’s will regulate their VoIP tariffs to recoup the cost of their new packet voice infrastructure. Also, the impact to VoIP tariffs will depend on factors such as:

Competitive pressures in country, (or for that matter, the lack of competitive pressure, resulting in relatively higher prices).
Deregulatory issues in the individual countries.
And the number of packet voice provides.

Radical discounts from packet voice operators won’t be obvious until competition and deregulation has forced prices down. Nevertheless, VoIP and VoFR promise to be the world’s cheapest form of voice communication. A financial analysis of the cost savings also requires an understanding of how Frame Relay Permanent Virtual Circuits (PVC) are charged. The local access connection itself, may represent over 50% of the entire network investment. A breakdown of the costs associated with end-to-end connections in a private network consists of the following elements, as shown in figure #4.

Figure #4. Structure of Frame Relay Tariff charges

In contrast, when using traditional leased lines, a dedicated connection is paid 24 hours a day, 7 days a week, regardless of usage. The benefits of such a connection are only obvious when it is used for more than 2 to 6 hours per day[3]. For less intensive bandwidth requirements, a dial up connection via the PSTN, or ISDN may be more economical. In this case, the subscriber only pays for the time to establish the call in addition to the time duration of using the connection. The economics of IP communications prove to be even more efficient in cost savings, since the user literally only pays for the amount of bytes sent over the network:

During call set-up (i.e. signaling information between PBXs), no bytes are sent.

Figure #4. End to End packet voice architecture

Figure #5. End to End packet voice architecture

With Digital Signal Interpolation, when one caller is talking the other is listening. (Remember that a voice conversation is a half-duplex communication). Which means that the listener in both sides of the conversation can send data in between speech.
During pauses in a voice conversation no bytes are sent.
When waiting for a receptionist to connect your call, no bytes are sent.

Moreover, the half-duplex nature of a voice conversation means a 50% reduction in data traffic. When the clock is still ticking, PSTN customer continues to pay for the connection. Not with VoIP. Pauses, listening, and waiting means no generation of packet and no charges to the phone bill. There is additional potential in bandwidth savings when using what is called voice bundling capabilities of multi-service access devices. This provides the ability to encapsulate several voice packets into one WAN packet. By bundling five voice packets, the overall bandwidth savings amount to over 30% in IP implementations, and over 7% in Frame Relay (figure #5).

Where Is My Guarantee?

Figure #6. Architectural implementations of connectionless (IP) verses connection-oriented (FR) networks

PSTN networks have proven time and again to be a very stable infrastructure. These networks have provided reliability figures of 99.999%, which translates into a downtime of only 5 minutes per year. Controversially, Frame Relay providers have offered network reliability from 99% (3.6 days of downtime) to 99.999%. The level that is chosen often depends on a Service Level Agreement (SLA), which invariably has a price tag associated with these up-time guarantees. The unpredictable nature of the Internet shows a lack of resilience in meeting the demands of the enterprise. When running mission critical applications a down time of minutes can result in millions of lost revenue dollars. Reliability in the Internet is not easily qualified, where communications are often referred to “best effort” transport of packets. IP alone, as a layer 3 protocol, does not have a mechanism in place to ensure end to end transport of information. In other words, once the message is placed in the envelope, and dropped into the mailbox, there is no mechanism in place which tells the sender that the message made it to the destination. TCP comes into play by providing these acknowledgements in addition to re-transmitting damaged or missing packets. However in reality, we need to go deeper than simply providing the connection; the difficulty in real-time communications over the Internet is its lack of flow control, quality of service, and bandwidth management. For this reason, private IP networks takes on these responsibilities for voice traffic by implementing services such as Real Time Protocol (RTP), Type of Service (ToS), or Resource Reservation Protocol (RSVP). From an IP router’s point of view if the network is congested or a node fails, then it is considered to be in an unstable state. Until the network reaches its convergence (i.e. stable) state then there is a threat that packets will not reach its destination. The time of convergence (i.e. time taken to recover to a stable state) is dependent on which routing protocol is used. In networks using the Routing Information Protocol (RIP) it takes 90 seconds for the network to learn and re-route around a failed node. During this time packets may be dropped. But this may not matter for voice, since a loss of a few packets will not adversely affect the quality of the conversation. But, if data packets are dropped, then TCP ensures that they are retransmitted. The other option is to use the Open Shortest Path First (OSPF) routing protocol which has the benefit of decreasing convergence to a few seconds, at the expense of a more complicated network design, when compared to RIP. From the user’s point of view, when a router fails, then packets are dynamically rerouted around this node, without interruption to the communication stream. This is a key advantage in the connectionless oriented nature of IP. Compare this to Frame Relay, whereby a failed node means an immediate loss in communications. A fundamental disadvantage of a connection oriented protocol.

Figure #7. Combined Architectural implementations of IP and FR networks

Today’s Frame Relay networks rely on Permanent Virtual Circuits (PVC) for network connectivity. When a link fails then the entire PVC will be rendered inoperable. Network integrity is then maintained by including low cost dial-backup connections where-ever financially feasible. Options include:

Implementing Frame Relay Backup port, whereby a failed link results in an automatic switching to a second PVC.
Installing ISDN dial backup ports. In this case a failed PVC results in establishing an ISDN call in less than 800 ms. The application is never aware of the broken connection.
Requesting a subscription to a Frame Relay SVC service which offers more dynamic methods of re-establishing connections.

Although a guarantee of 100% reliability may not be feasible in today’s networks, consider purchasing as many nine’s in your SLA, as fits in your budget. As with buying travel insurance, when disaster strikes, you will be happy that you made the extra investment. IP does a great job in ease of configuration. Adding a device to an IP network could be as simple as configuring the IP address[4], and setting the IP mask. The routing protocol then updates all neighboring tables. Frame Relay paints a different picture in node scalability. The PVC nature of Frame Relay somewhat complicates the configuration of these networks which are not based on a destination and a source address, but rather a single Data Link Connection Identifier (DLCI), defining the end-to-end connection. To illustrate this complexity, we can use the example of a fully meshed architecture of 9 nodes, where all nodes must be connected to the other. If an 10^th node is added to this network then 10 new DLCI configurations are required to ensure that all nodes are interconnected. Likewise, for a 1000 node network, then 1000 new DLCI’s must be configured, etc. The bigger the meshed network, the stronger the business case becomes in deploying IP. For this reason, star architectures (i.e. also known as hub and spoke architecture) have proven to be more practical for Frame Relay and acceptable for most networking requirements; By consequently allowing all traffic through a regional concentrator, and then to it’s final destination (figure #6). For large corporate networks as well as public offerings, these two architectures may be combined to offer the advantages both worlds, as shown in figure #7. The distributed traffic of a meshed architecture prevents congestion in any single node, and the star architecture provides simplicity of configuration in the branch locations.

Multi-Media and Jitter

Figure #8. Simplified representation of a memory buffer in a packet network which controls jitter

An added benefit of packet technology is in its ability to combine various traffic types from existing applications. The network become a shared medium for transmitting multi-media information and allows all applications[5] to be managed over a single network. This capability introduces new challenges, when dealing with new annoyances such as jitter delay. Jitter is an inherent characteristic derived from the real-time requirements of voice and other multi-media traffic. When voice is communicating over a shared connection, then the inevitable result is that one application will have to wait, while the other is transmitting. The use of memory buffers becomes important in this instance in both the transmitting and receiving ends. In this case, when the transmitter is sending traffic from two applications, a buffer is used to regulate the incoming data from the application, and the outgoing transmission of this data onto the WAN connection[6], as shown in figure #8. A similar delay occurs when two voice conversations are sharing the same connection. A shared connection such as in Frame Relay’s (FR) Statistical Time Division Multiplexing (STDM) allows only one frame to transmit at any given moment in time. In other words, each frame sent over a Frame Relay connection will have access to the entire available bandwidth over any given link. Therefore, as one caller is sending voice packets, a second caller or application must wait in a memory queue. At the receiving end the Jitter buffers is configured to wait for these delayed voice packets. Jitter is calculated as the delay between any two packets entering the receiver’s buffer. All factors being equal, when only one voice conversation is present on the Wide Area Network (WAN) connection, then the jitter delay is zero, since all packets are received one after another, (i.e. just like cars converging into one lane on the highway). As we begin to add additional voice conversations or LAN applications (i.e. we add more car lanes that must converge into one), then jitter increases. As a result, jitter may become a significant deterrent to voice quality if applications are not controlled, or proper prioritization schemes are not in place. As previously mentioned, to control the quality of a voice conversation a jitter buffer is required at the destination to ensure that we wait for delayed packets. When the jitter buffer has received a significant number then we decompress them and convert them back to an analog voice signal. However, we can only wait so long for packets to enter the buffer, because we have to maintain a smooth conversation in real-time from the perception of the caller. In both Frame Relay and IP (i.e. using UDP), a jitter delay of 50 to 150 ms is an acceptable time limit. The ability to dynamically change the size of the jitter buffer is ideal, in order to optimized voice quality for different network characteristics. Namely, why wait unnecessary for 150 ms if the all the voice has arrived after 50 ms).

Figure #9. Various locations of delay in a data network

Making Sure it Works

Understanding the characteristics IP and Frame Relay allows us to make clear decisions in designing networks. Outlining the differences between protocols, as we have done in this paper, provides a smoother integration of new technologies into existing networks, as well as addressing future requirements. Where is the fine line between high quality vs. acceptable quality? In the consideration of packetized voice communications, the biggest concern in the enterprise is in the attainable quality of voice. The answer is in providing a balance between voice delay and packet loss.

Would you want 70% of high quality voice traffic with no delay, or 100% of voice with 400 ms delay?

Acceptable voice quality lies somewhere in the middle. Tests have shown the following:

In terms of Packet Loss – Acceptable voice quality is attainable with less than 5% packet loss. Unacceptable voice quality starts at 15% packet loss.
In terms of Packet Latency – Round trip delay is detectable by the ear at around 150 ms. Poor quality begins at 500 ms.

Based on these studies, voice can tolerate a reasonable loss of packets, and still manage to maintain a high level of quality (Unlike data traffic, which must be received exactly as it was transmitted). For this reason, TCP is not recommended for voice; The acknowledgements and re-sequencing responsibilities of TCP introduces needlessly delays in the voice stream. The mechanisms in TCP, which ensure 100% data transfer is considered overkill for voice. For this reason, UDP/IP is the preferred protocol combination for IP Voice, due to the non-guaranteed nature of the User Datagram Protocol (UDP). In other words, if a packet is received out of sequence, or received in error, then we simply discard it, resulting in negligible changes in quality. Due to the high cost of WAN connections it is important to assess both the immediate needs of the corporation, in addition to the future growth potential in voice capacity. We can apply Moore’s law, whereby processing power doubles every 18 months. When integrating this axiom into future WAN growth then data traffic alone begins to have a significant impact on bandwidth requirements. An analysis of a corporate phone bill during the peak hours of usage will give a fair indication as to the added voice bandwidth that will be required. The analysis of peak traffic requirements over a typical week provides a “worst case” scenario, with can be compensated, depending on the transport protocol used. Remember to accommodate for future bandwidth intensive applications such as video surveillance, or video conferencing. Finally, don’t overlook the fact that communication appliances require local power in order to remain in operation. The -48V_DC power backup as provided by your local telephone company won’t come in handy during a power outage. In the world of data communications a severed connection due to a blackout or brownout may mean lost data. Steps should be taken to ensure a consistent flow of data by using:

Uninterrupted Power Supplies (UPS)
ISDN backup connections
Redundancy in power supplies
Or installing redundant communication equipment, which sits in standby mode.

Redundancy has becomes an important aspect in today’s data networks. Failures can occur anywhere in the network connection. Whether it is due to the backbone link, backbone switch, access link, or access device, backup solutions ensure that critical connections remain operational. In mission critical applications it is vital to determine the weak link in the chain, in order to minimize the risk of downtime. From a regulatory point of view, if packet voice is a serious consideration then it is important to research the local law before endeavoring to implement VoIP or VoFR. Many developing countries such as in Central & Eastern Europe are still dominated by a PTT monopoly and thus have control over the voice market. At the same time, many of these countries do not have clear-cut definitions as to what is actually regulated when voice is sent as data. In the end, it is wise to invest some time in understanding the local telecommunication regulations, just to be on the safe side.

Demanding Voice Quality

The challenges behind designing packet voice appliance is up against tough obstacles in today’s data market:

Minimizing device latency while at the same time maximizing compression.
Efficient use of bandwidth, while ensuring enough Central Processing Unit (CPU) power to juggle bandwidth management resources.
Providing a plethora of value added features such as voice broadcasting, centralized switching tables, and PBX signalization, while maintaining a competitive price position.

In data communications, delay is a fact of life. By understanding the effects of delay we can design packet voice networks to minimize this effect, thereby maximizing quality. To begin with, delay resides in various locations of the network, as shown in figure #9.

Access Device Latency is the time required for the Packet Voice appliance to process the voice signal. This may include
- An analog to digital conversion
- Digital compression of the signal
- Segmenting into packets
- Encapsulating into a chosen Wide Area Network (WAN) protocol
- And finally sending this information out the network port.
Serial latency is the time taken for a packet to enter the network link, to the time it takes to exit the link. This link may be a copper pair, optical fiber, a coaxial connection, or simply air. This delay is dependent on clock speed, and to a lesser extent: the speed of light.
Backbone Device Latency is the delay in a backbone node due to the processing information units and forwarding them to another backbone port.

Practically speaking, if we send a 1518 Byte packet[7] over a 64 Kbps link, then our serial delay amounts to 185ººmilliseconds. Now compare this to a delay of less than 8ººmilliseconds when sending a packet that is 5% the size (i.e. 64 Byte packet over the same 64 Kbps link). In other words, the smaller the packet, the faster it gets to the other side. Therefore, voice engineers opt for relatively small packet sizes in order to minimize delays during real-time communications. Data engineers, on the other hand, prefer large packet sizes to ensure that data is sent to the receiver with as little overhead as possible. Packet size becomes a fundamental consideration when designing data communication appliances for voice transmission. To minimize the delay of real-time voice or video the data packets must be segmented into smaller information units. Typically the size of the segmented data packets are similar to that of voice. Conversely, it is important to return data packets to their original size when voice is not present. This ensures overall efficiency of all applications in the network. But just how big should be the voice packets? The trick in the implementation is to design a packet size that keeps the delay to a minimum, without compromising voice quality. To bring this into perspective, studies have shown that round trip delay becomes intolerable above 500 ms, and noticeable to the human ear at around 150 ms. Therefore, a packet of voice must overcome all latencies in both directions (i.e. from the access device, link connection, and backbone nodes), in less than 500 ms. The end result is an optimal voice packet size in the range of 20 bytes to 128 bytes, depending on the characteristics of the network. Delay in data networks is not a characteristic of distance, but of clock speed. Consider a 400 km stretch of leased line, with an additional router placed in the middle, thus dividing the connection into two 200 km segments. It now takes twice as long to reach the destination. Bear in mind that each hop will also result in additional device latency, as well as serial delay, as shown in figure #10. The chosen packet size may have a significant impact on delay when overhead is added. As shown in figure #11, Frame Relay adds an additional 6 bytes to a voice packet payload of 64 bytes:

Two bytes for flags
Two bytes for the header
The remaining two bytes for the Frame Check Sequence (FCS).

IP adds up to 28 bytes:

20 bytes for Internet Protocol (IP)
8 bytes for the User Datagram Protocol (UDP) header)

For IP, the result is an overall packet size which has increased by a minimum of 30% (i.e. 92 byte IP packets in the example provided earlier), compared to 9% when using Frame Relay. One option in reducing the overall network overhead is to use Voice Bundling. This allows for several voice packets to be encapsulated in one WAN information unit (i.e. a frame in Frame Relay, or a packet in IP). The end result is a reduction in network overhead to only 8% in IP and only 2% in Frame Relay! In summary, the overall increase in packet size translates directly into the bandwidth requirements of IP vs. Frame Relay. In the case of a 5.3 Kbps compression algorithm using 64 Byte packets; Frame Relay requires 5.8 Kbps of bandwidth, and IP requires 7.6 Kbps.[8] The options don’t stop there; by using RTP header compression in VoIP, the IP header is decreased from 28 bytes down to 6 bytes – On par with Frame Relay.

Figure #10. End to End delay when an additional hop is added

As a closing note, additional overhead used to keep the network in a state of convergence must also be calculated into our example. The overhead in this case refers to data traffic required to maintain stability in the network. For instance, IP networks requires routing protocols such as Routing Information Protocol (RIP) or Open Shorted Path First (OSPF), to ensure reliable routing, in addition to control messages Address Resolution Protocol, and Internet Control Message Protocol (ICMP), etc. The expense of adding this traffic to the network is far from justified, due to the ability for IP to dynamically route around failed nodes.Although the overhead of 28 bytes in UDP/IP may not appear significant, consider the analogy of collecting pennies. Is it worth the time and effort? If not, then consider a collection of 10 million pennies. Now is it worth some attention? The answer now becomes obvious: What was initially considered irrelevant now becomes a major issue of concern. In the real world, an average corporation of 50 employees may send as many as 6 million bytes per month. If two million bytes are calculated as overhead, then maybe the cost of the protocol is worth a second thought. In terms of link latency in the backbone, PSTN circuit switching environments are only limited by the speed of light[9]. To bring this into perspective, consider a transatlantic telephone call, separated by 6000 km. In a PSTN network the round trip delay amounts to approximately 20 ms. This delay is purely a function of distance and the speed of electrons over this connection. Although this delay may seem insignificant, it paints a different picture in satellite communications where a ground to satellite delay amounts to 120 ms (i.e. Satellites are fixed at a geosynchronous earth orbit of 36,000 kilometers from the earth’s surface). The result is a round trip delay of 480 ms round trip, not to mention delays incurred by additional device latency. For these reasons, packet voice communications over satellite links becomes a delicate design issue in maintaining tolerable voice quality. VoFR defaults as the technology of choice, due to the added efficiency and lower latency in satellite communications, when weighed against VoIP. Setting aside network congestion for the moment, it is important to understand the limitation of router latency in the backbone compared to Frame Relay. To begin with, routers are responsible for carrying out several tasks as required from an OSI[10] layer three device:

Performing a circular redundancy check
Reading the destination IP address
Checking the routing table as to the port the packet must be sent
And forwarding the packet to the appropriate port

All of which adds delay in the realm of tens of milliseconds. Frame Relay switches have the advantage, in that processing is only required for two communication layers. The end result is in a switching latency in the region of hundreds of microseconds. A fractional amount of time compared to routers, and negligible for any voice conversation.

Figure #11. Voice Packet size in Frame Relay verses IP

Carrying Packets Over the Backbone

Despite the large pipes in the backbone, they are not exempt from quality issues. When sending multi-media traffic, the backbone is continuously battling to control network congestion. Therefore, it is important to resolve the capabilities of the backbone, in order to make a clearer assessment of its capabilities (figure #12) An analysis of any public network can be simplified by determining the total bandwidth availability (as provided by the carrier) versus the number of subscribers to their network. Simply speaking, if a provider has a backbone capacity of 622 Megabits per second (Mbps), and has subscribed 10 000 customers, then each customer is entitled to a maximum of 62.2 Kbps during peak hours of the day. Although this analysis is not entirely accurate for real world networks, the calculation is important in determining the provider’s ability to provision bandwidth to new subscribers. In short, a subscriber may pay for a 512 Kbps access connection, but if the backbone is ready to offer this bandwidth, then some of the traffic will not make it to the other side. In private networks the effect of congestion can be minimized by properly designing the network and forecasting traffic requirements. Network managers may use mainstream network analyzers or network management tools to determine the link usage during any given window of time. Then, implementation comes down to commissioning enough bandwidth to accommodate either peak or average traffic. Unfortunately some providers have neglected to follow this model, resulting in over subscription of bandwidth by as much as 600%. Luckily we are able to control traffic patterns in private networks. Flow control becomes an important tool during times of congestion. In Frame Relay provides this assurance is via explicit and implicit flow control mechanisms.

Implicit Flow Control requires the proper configuration of the Committed Information Rate (CIR), and a Committed Burst Size (B_C) to ensure that the application remains within their subscribed bandwidth. Explicit flow control ensures that all applications which are sharing bandwidth in the backbone, will reach their destination.
Explicit Flow Control in Frame Relay refers to Backward Explicit Congestion Notification (BECN) and Forward Explicit Congestion Notification (FECN), which notify the source and destination devices respectively that traffic must slow down, otherwise data may be lost.

Both mechanisms play with the data buffers in each node to ensure that data is preserved when a link is incapable of holding additional traffic.

Figure #12. Determining Bandwidth requirements in the Backbone

Voice Standards and Signaling

We have come a long way since the early 80’s when digital was first introduced to us in the form of compact disc (CD) players, and desk top computers (PC), and Integrated Switched Digital Network (ISDN) services. We have matured from the primal excitement of hearing an analogue signal reproduced from a digital source, to the 90’s where we now demand excellent voice quality in as little bandwidth as possible. The following real-time voice compression standards were introduced in the middle of the nineties to address these market requirements:

G.729 provides an 8:1 compression of voice (8 Kbps stream), often referred to as “toll quality” voice, and is based on the Conjugate-Structure Algebraic-Code-Excited Linear-Prediction (CS-ACELP) algorithm. A recent variation called G.729A uses the same algorithm, but is based on a simpler codex, and subsequently uses less CPU cycles when processing the voice stream.
G.723.1 offers up to 12:1 voice compression and has also emerged as a widely accepted algorithm by providing 5.3 Kbps or 6.3 Kbps voice streams. This algorithm is often referred to as “business quality”. G.723.1 at 5.3 Kbps is based on Algebraic-Code-Excited Linear-Prediction (ACELP), whereby G.723.1 at 6.3 Kbps uses Multi-pulse Maximum Likelihood Quantization (MP-MLQ).

Figure #13. Importance of Signal conversion in ensuring ‘Any to Any’ connections

The Frame Relay Forum has endorsed both G.729 and G.723.1. Details can be found in the FRF.11 specification. Two optional classes are supported: Class 1 uses an Adaptive Differential Pulse Code Modulation (ADPCM) algorithm called G.727, and Class 2 uses G.729A. VoIP vendors have adopted the H.323 specification, which uses G.711 as the default audio standard. This standard involves a straight forward analogue to digital conversion of a 4 kHz voice signal and results in a Pulse Code Modulated (PCM) voice channel of 64 Kbps. The same standard is used in today’s Integrated Switched Digital Networks (ISDN). G.729 and G.723.1 have also been adopted by H.323 as optional algorithms. As a general rule, higher compression results in lower quality. This is becoming arguable as compression algorithms become more sophisticated. The choice between which to use will come down to market acceptance, availability, and customer expectations. At this stage both standards are well established and excepted in the packet voice industry[11]. All aspects of network planning, such as expected growth, quality, and budget constraints, also play a roll in deciding which one to use. It is important to note that voice compression is independent of voice signaling required between different Local Exchanges (LX), or Private Exchanges (PBX). The access devices must be able to interpret, and convert between different voice signaling protocols, some of which are listed below:

Earth and Mouth (E&M) Immediate, E&M Wink, proprietary E&M implementations etc.
Foreign Exchange Station (FXS)
Common Channel Signaling (CCS)
CCS Q.SIG
Channel Associated Signaling (CAS)
ISDN Primary Rate Interface (PRI)
ISDN Basic Rate Interface (BRI)
And other proprietary signaling standards

Figure #14. Gateway functionality between Frame Relay and IP networks

In other words, two access devices may talk the same compression language, but if they don’t understand the meaning behind the words then they won’t be able to establish a communication. Voice compression standards can communicate over hybrid networks, as shown in the figure #14. The advantage to the consumer is in providing independence of the compression algorithm to the transport protocol. Interoperability, between networks is a matter of deploying a gateway device, which acts as a translator between these protocols. The communication is seamless to the extent that when a packet from the frame relay network enters the gateway, then the gateway strips the frame relay header and replaces it with an IP header. The opposite function is performed in the reverse direction. One of the most important design considerations is in ensuring voice signalization compatibility. In corporate networks, Private Telephone Exchanges are outfitted with various interfaces. The support for signalization between these different types of PBX’s is necessary in giving each branch location a seamless telephone connection. As shown in the figure #13, several voice signaling standards are shown. All of which may potentially connect to a different signaling interface on the receiving end, depending on how the telephone call is switched (or routed) in the network. Signalization compatibility and conversion between different standards is vital to a seamless and stable packet voice network. Conformance to standards will guarantee that equipment from various vendors can efficiently communication with one another. Bear in mind that the key to an efficient and stable network is testing, and more testing. Remember that Packet voice solutions does not only provide a substantial cost savings to the corporate phone bill, but also a long list of valued-added features in its technology portfolio. Look for network enhancements such as:

Digital Speech Interpolation – Providing the possibility to send data during pauses in a voice conversation.
Echo Cancellation – Eliminating near-end or far-end echo. Echo is an annoying characteristic of telephone networks which mainly results from an impedance mismatch from the two wire to four wire conversion in Local Exchange hybrids.
FAX over IP or Frame Relay – Integrating fax communications over low cost packet networks.
Voice Broadcast – Sending announcements to many branches with one voice transmission.
Centralized Switching Table – This feature greatly simplifies network expansion by configuring only one node for voice switching. This master table manages switching for the entire network.
Support for pulse dialing and DTMF dialing – Provides compatibility between different telephone sets.
Hunt Groups – Customer only have to dial one number into your organization in stead of several numbers, because each one is busy. This one public telephone number is virtually connected to several others. The packet voice access device acts as a private exchange in this case by always freeing up the main number, and dynamically connecting each caller to a free line.
Voice/Data over Single DLCI -This amounts to large cost savings by allowing the ability to have Data Frames and Voice Frames over the same Frame Relay DLCI. The customer effectively pays for one DLCI connection for all outgoing calls.
Alternate Call Routing – This feature allows urgent calls to be sent over an alternate route in cases where the primary link is unavailable.
Voice Channel Bundling – The overall network bandwidth requirements and overhead is reduced by bundling several voice packets into a IP or Frame Relay frame.
Silence Suppression -Saves bandwidth by analyzing the speech pattern and determining the periods of silence. Information flow can be suspended during these pauses, and overall bandwidth is saved.

Understanding the immediate requirements of the corporation versus the future plans for expansion is important during the planning stage. The reasons are simple; we expect a stable, reliable, and working solution. Assurance from the vendor that standards will be supported in modular software or hardware upgrade, is important in ensuring investment protection.

Where Did All the Services Go?

In identifying the benefits behind Class of Service (CoS) and Quality of Service (QoS) it is important to distinguish between these services as it pertains to packet voice. CoS is qualified as being an enhancement to data transportation, by providing a priority scheme to different traffic types in the network. The original Frame Relay standard does not have a mechanism in place, which prioritizes one packet over another. For this reason, the implementation of CoS must be implemented at each access node, whereby data is prioritized before it enters the network. This brings us to the current implementations of today’s Voice over Frame Relay. In this scenario, voice, and any other real time traffic such as video, are prioritized ahead of data before entering the network. Once the information enters the network cloud, then all frames treated equal. IP version 4 also does not have a CoS mechanism in place. Although IP version 6 (or IPng, for Next Generation) implements a priority scheme, but this version is not expected to be in widespread use until well into the next decade. IP version 6 introduces additional latency due a larger header size of 40 bytes in a basic configuration. In the mean time Real Time Protocol (RTP), Type of Service (ToS) and Resource Reservation Protocol (RSVP) and are promising to provide the interim solution to CoS voice communications in today’s private IP networks. ToS is one of the simplest methods for implementing prioritization in today’s IP infrastructure, simply due to the fact that the ToS field has been apart of the original IP version 4 specification. Although the ToS field is only one byte its implementation was considered too complicated. Which is the reason why it has remained dormant all these years. But in today’s multi-media boom, the advantages outweigh the disadvantages compared to competing implementations such as Resource Reservation Protocol (RSVP). The ToS field characterizes delay and throughput, and can work together with routing protocols such as OSPF in offering end to end prioritization of packets in the network. Many vendors are beginning to implement ToS prioritization, in providing the first steps of IP Class of Service. Quality of Service enhances end to end communications by guaranteeing bandwidth. By definition, the circuit switching nature of the PSTN offers a high level of QoS. Simply due to the fact that a connection from the transmitter to the receiver in the PSTN is dedicated for two users alone, and the notion of the link dropping during a voice conversation doesn’t usually cross our mind. On the other hand, the PSTN cannot implement CoS since connections are dedicated, not shared. Packet voice requires an entirely different approach to QoS due to the basic shared nature of data networks. Beginning with Frame Relay, today’s use of Committed Information Rate (CIR) and PVC links provide the bandwidth guarantee required for voice. The flexibility in implementing real-time protocols goes beyond the CIR, in that data and voice can be multiplexed within a given PVC, or they can be separated into dedicated PVC’s, depending on the applications used, network design, and cost considerations. In addition, Frame Relay has the flexibility to deal with bursts of traffic during peak hours of the day by configuring the Committed Burst Size (B_S). In this manner the customer can subscribe to a Frame Relay service, and order bandwidth based on the average throughput of their network, and at the same time have the flexibility to send traffic above their limit, when required.

Support to the Desktop

PSTN wiring has relied on the tried and true telephone cabling system. This 0.4 mm diameter copper wire, also known as 26 American Wire Gauge (AWG) wire pair is the foundation for millions of kilometers of local loops around the world. Packet voice breaks away from this dependence on a specific media type by allowing the flexibility to use an entirely new portfolio of wireless or wire-line products. Packet switching allows the use of copper, fiber, coax cable, wireless, or any future media, as it becomes available. The benefits of which allow corporations to install a cost-effective cabling system which can scale to the requirements of the future, and competitive carriers have the possibility to avoid the high cost of leasing the last mile from local PTT’s. With Packet voice implementations, the media no longer becomes a limiting factor. The structure of data communication protocols separates the physical layer from the protocol, thus allowing heterogeneous physical connections end-to-end. There are fundamental differences in how Frame Relay and IP networks can handle applications. To begin with, Frame Relay standards were modeled after the Open Systems Interconnection (OSI) seven-layer standard. On this basis, we can deduce that each communication layer in this protocol is independent of the other (figure #15).. The advantage here is in the flexibility of integrating future protocols to the existing model. Although IP is often modeled after OSI, the definitions are not so clear cut, mainly due to the fact that IP was invented before the OSI model[12]. Regardless, OSI is still used as a theoretical model in protocols today, in order to understand their basic structure. Frame Relay operates independently of OSI layer three (the Network Layer), all the way to layer seven (the Application) (figure #16). Support for standards such as Request for Comment (RFC) 1490 now come into play by providing the thread, which binds Frame Relay to mainstream protocols such as:

Internet Protocol (IP)
Internet Packet Exchange (IPX) from Novell
Systems Network Architecture (SNA) from IBM

The implementation of packet voice becomes an issue of supplying a new thread, which binds packet voice to Frame Relay. This specification has already been written in Frame Relay Forum’s FRF.11.

Figure #15. Independence of Frame Relay to the Application layer

The emerging multi-media standard International Telecommunication Union (ITU) H.323 is intended for unguaranteed communications. H.323 is defined up to the desktop and specifies a foundation for audio, video, and data communications across packet based networks such as IP or Frame Relay networks. This standard provides a means by which different vendors may negotiate communication sessions by determining the capabilities of a device (i.e. “Do you have video?”). Then activate features which are available (i.e. “Yes, I have video capabilities. I’m going to send it to you in H.261. Can you read this format…” etc.). H.323 are tightly tied to IP, bringing with it an entire portfolio of applications such as Video conferencing, voice conferencing (VoIP), and white boarding.

Figure #16. Dependence of IP to the Application layer via H.323

Witnessing the Evolution

From the perspective of a standard telephone conversation the differences between VoIP and VoFR may not be significant. The contrast between these respective transport protocols comes down to data support such as non-IP protocols such as SNA, IPX, and an entire series of legacy protocols where Frame Relay begins to flex its strength. The paradigm shift of Packet voice will have a positive and lasting impact on today’s telecommunications. Packet voice has already shown a major impact on today’s communications. But even more exciting will be in new value-added features and enhancements that will bring some added spice to market. Expect to see compression algorithms squeezing voice through a tiny 2.4 Kbps pipe, in the early part of the next decade. As the telecommunications and data communications industries converge expect to see less expensive services and more choices to the subscriber. Making a phone call in the next decade may prove to be much different than what we are accustomed today. Our telephone conversation may be sent through the PSTN, ISDN, ATM, Frame Relay, or IP, or any combination, for that matter. But remember to focus on the immediate benefits of packetized voice. This means implement VoIP or VoFR in key areas of the corporation which is expected to provide short-term financial and competitive benefits. Then move to the next phase. In closing, existing router networks can not deploy multi-media without an understanding of the Internet Protocol and Frame Relay. In taking the appropriate steps in network design, a cost effective and high quality packet voice solution can be realized. A common misconception is that the existing network must require major changes in order to accommodate packet voice. This is not the case; Packet voice becomes a logical extension to any existing private IP or Frame Relay network. The success of implementing packet voice is in the design of the network. Especially in managing flow control and in dealing with congestion in real-time communications. Packet voice can be integrated into existing corporate networks as a cost-effective addition to their network. Motorola provides leading-edge expertise today, in both Voice over Frame Relay and Voice over IP. A complete portfolio of features as discussed in this article are supported in a single Infinity Multi-service Access Device called, Vanguard. The Internet and Networking Group and has installed over 70,000 packet voice ports since the beginning of the 90’s. Motorola delivers a comprehensive set of wide area networking solutions through its “Infinity Connections” product portfolio, including hardware, software and managed network services. Motorola has been a world-wide leader in communications for over 70 years. Our portfolio span international wireless and wire-line markets in for the telecom and datacom industries. Throughout the 90’s Motorola Internet Networking Group (ING) has consistently proven its position as a number one vendor of Frame Relay Access Devices, as published by market research organizations such as Dataquest, Vertical Systems, and the Yankee Group.

References

[1] A packet is traditionally referred to a unit of information in OSI layer 3 protocols such as X.25 or IP communications. Whereby, a frame is an OSI layer 2 unit of information, used in Frame Relay. For the purposes of simplicity this article will use the terms packetized voice or packet voice, as a general expression when referring to information units used in transporting voice.

[2] Access to the Local Exchange may be via an analog dial-up, ISDN, or a leased line connection.

[3] Actual numbers will depend on country specific pricing structures.

[4] Registration to the Internet Engineering Task Force is required if the node is directly connected to the Internet.

[5] In reference to data communications we refer to applications as being telephone conversations, e-mail transfer, Internet web browsing, or IBM’s Systems Network Architecture, etc. This is contrary to traditional desktop applications such as Microsoft Word or Adobe Illustrator.

[6] This function is analogous to the leaky bucket algorithm, whereby the bucket represents a memory space, which temporarily holds data as it is poured in from the top and leaks out the bottom.

[7] the Maximum Transmission Unit [MTU] for Ethernet packets

[8] In actual fact the overall network overhead is slightly greater than the calculations shown, because of the segmentation required in a voice stream. To illustrate, when a fixed voice packet size is used, then the last packet will most always be partially filled, requiring additional bytes as “padding”.

[9] To simplify this model we are assuming that the speed of electrons over a telephone fiber optic cable will move at the speed of light, (i.e. c = 3.0 x 10⁸ meters per second). In actual fact, electrons travel at a fraction of light speed, based on the characteristics of the chosen communication medium.

[10] Open Systems Interconnection is based on a seven layer theoretical model by which we reference data protocols.

[11] Note that a G.723.1 device can not speak directly to a G.729 device. A converter would be required to provide the transition from one algorithm to the other. If this functionality is offered then the Digital Signal Processing (DSP) required may introduce unacceptable delays and distortion, resulting in poor voice quality. Nevertheless, current technology is not powerful enough to provide this type of conversion in real-time.

[12] Vinton Cerf and Bob Kahn detailed the Transmission Control Protocol (TCP).

Motorola – The Packet Voice Revolution