At some point, Network engineers will likely face some type of issue with MTU or maximum transmittable unit. Their first experience with this may be an eye opening and time consuming effort. After resolving the issue, those with a thirst for knowledge will take the necessary time to understand the issue.
MTU problems are most often seen when Path MTU Discovery, or PMTUD, fails to function. This is the process by which one end host determines the largest possible packet size to another station on the network. Symptoms of this type of issue include two devices having proven reachability, but applications fail to work in a way that indicates a network issue. Some applications may even crash or hang the system.
Symptoms of PMTUD Failure
- Hosts may be able to ping one another
- Service/Port may prove accessible using telnet
- Severe and persistent application issues
- Partial page loads
- Either host appearing to hang
Understanding IP MTU
To understand the problems of Path MTU Discovery, it is first necessary to understand how MTU relates to the conversation. MTU, or maximum transmittable unit, is the maximum chunk of data that a given interface can transmit. The type of data receiving our attention is IP Packets, so our focus is IP MTU.
The maximum size of an IP Packet is technically 65,535 Bytes. However, there are often underlying limitations that drive this value down. For example, Ethernet has a typical carrying capacity of 1500 Bytes. Since IP Packets almost always transit an Ethernet network, IP Packets are most commonly 1500 Bytes or less. Exceptions can be made by consistently configuring larger frames sizes throughout the network. However, network administrators can only control their own networks.
Other layer two technologies have different MTU limitations. Depending on the implementation, Token ring can carry 4464 or 17914 Bytes. Traditional WAN technologies have start of frame and end of frame patterns. In those cases, there may be no MTU limitation other than the theoretical maximum IP Packet size.
The most common way to connect a host to the network is using Ethernet. As a result, many hosts assume they can transmit 1500 Byte packets. The challenge comes when there is lower MTU somewhere in the middle of the network. To be clear, a lower MTU does not necessarily mean less bandwidth. A lower MTU is only a restriction on the packet size and nothing else.
When a host transmits a packet too large to pass an intermediary link, one of two things can happen. First a router might fragment then passed the packet. In this case, the receiving host will reassemble the fragments and recreate the original packet. The second possibility is that the router drops the packet and sends a message the originating host. In that case, the message instructs the host of the maximum IP MTU for the link that couldn’t transmit the packet.
Whether a the router fragments or drops the packet, depends on a special bit in the IP header. This bit, known as the don’t fragment (DF) bit, instructs routers on how to handle this process. If this bit is set to 0, routers fragment packets as necessary. If it is set to one, the router drops the packets that cannot traverse a link and sends and ICMP Message (type 3 code 4) to the originating host. When the originating host receives this message, it reduces the size of the packets it sends to that destination. Additionally, it resends the dropped packet. This is the Path MTU Discovery process.
Causes of PMTUD Issues
This process of fragmenting or dropping oversized packets is performed at each IP hop of in the path. In my experience, there are several common causes for PMTUD issues. In each case that a problem presents, there is a combination of a transit link with lower MTU than the end hosts and something to preventing the host from realizing it.
PMTUD Failures Include:
- Router with lower MTU Link not generating unreachables (check for “no ip unreachables”)
- Firewall is blocking ICMP type 3, code 4
- NAT/PAT makes flow appear as unique session
- Less layer 2 carrying capacity than a router realizes
These types of issues seem to appear as we introduce overlay or encapsulation technologies. I commonly see issues with PPPoE, VPN’s and GRE. The important thing to remember is to always test newly instantiated links to confirm reachability at the expected maximum size. A simple ping or telnet test isn’t enough. It is necessary to test all provider links and overlays using the appropriate packet size and the df-bit.
//1500 Byte Packets Work R1#ping 188.8.131.52 size 1500 df-bit Type escape sequence to abort. Sending 5, 1500-byte ICMP Echos to 184.108.40.206, timeout is 2 seconds: Packet sent with the DF bit set !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 44/44/48 ms //1501 Byte Packets Fail R1#ping 220.127.116.11 size 1501 df-bit Type escape sequence to abort. Sending 5, 1501-byte ICMP Echos to 18.104.22.168, timeout is 2 seconds: Packet sent with the DF bit set ..... Success rate is 0 percent (0/5)
Recognizing PMTUD Issues
In addition to testing, recognizing the symptoms of a PMTUD failure is important. Something that once tested correctly might have issues after someone makes a change. Recognizing this type of issue first involves confirming connectivity. Using a telnet session to the service port and watching for a three way handshake is a good way to verfy connectivity. If this is successful but the application demonstrates an obvious connectivity issue, there is the possibility of a PMTUD failure.
Path MTU Discovery Issues can create widespread chaos in a modern network. These issues are most likely to occur when using TCP based services with transit links less than a 1500 Bytes MTU. Those who have experienced the pain of troubleshooting a large scale PMTUD failure will be much quicker to consider this a possibility when networked applications start misbehaving in strange ways.