best practice – 5 ways to troubleshoot cloud services and applications – part 3

Posted 4 September 2017

In Analyzing TCP sessions, best practices, Different ways to troubleshoot

This is already the 3^rd post (in a series of 5) where you can learn from our experience as TCP relationship therapists for cloud services, applications and networks; time flies when you are having fun! 🙂

All 5 posts are also available in one convenient PDF; click here to get your copy.

During the previous post you learned how to identify and analyze the not-so healthy starts of a relationship as well as how this impacts the user experience. The case study about a Health-check on a Citrix/SAP environment, clearly shows how such behavior between the Citrix client and the Netscaler even impacts the complete application chain!

In this post you will learn how to identify and analyze the not-so-healthy endings of a relationship and the impact this has on the user experience. Another case study about a Health-check on the Dutch “Daisylezer”-app as well as the underlying streaming media platform explains in great detail how such behavior impacts the app and its user experience.

As seen in the 1^st post the recommended, healthy ending of a relationship is when client- and server-side are sending a FIN. Meaning both parties are telling each other that there is nothing else to do. If agreed, eventually they will confirm this to each other by sending an ACK and delete the details from their session table freeing up resources assuring responsiveness. Sending an ACK can be delayed if some data is still in-flight.

A healthy ending of a relationship since both hosts are using a FIN-ACK

Analyzing healthy endings of a relationship can be somewhat complicated when an application is using the RST-flag as a way of ending things. The problem with this is that such an application is not able to recognize the RST that is part of an attempt to revive an existing relationship. Microsoft applications are well-known for this (mis-)behavior when using SSL.

Relationship revival is the situation where a host is repeatedly sending messages with a higher sequence number then expected by the receiving party. Eventually the receiving party will send a RST indicating that it wants to reset its sequence numbering while keeping the relationship alive.

If the sending party interprets this RST as a relationship-end, the host will not only send an ACK but will also delete the details from its session table.

A not-so-healthy end of a TCP relationship

The receiving party now believes the other side is now in good shape and keeps its channels open for new messages. From here-on several things can happen; good and not-so-good. But whatever they are, it is not a healthy situation on the long run as this will slow down the application at irregular intervals due to a mismatch in the interpretation of the RST-flag!

The only way to validate that this is indeed happening, is a kind of triggered message capture approach. A triggered message capture gives you the flexibility to use different filters compared to the ones that were picked from the list in a dashboard.

The advanced options allow you to have this running for a longer period of time; depending on the amount messages, file size and message slicing. In this case up to 60 minutes, up to 10k messages and a file size of up to 50 Mbyte; whatever comes first.

Starting a triggered message trace

Because of these filters and advanced options, there is no need to invest in dozens of TBytes of diskspace storing all the messages for all the cloud services and applications.

The outcome is a small PCAP file because only the messages that matches the filter are stored. Meaning you don’t need to process dozens of TBytes of messages in an attempt to find those few “needles” that are part of the troublesome cloud service or application.

The case study about the Dutch “Daisylezer”-app from Dedicon clearly showed how this approach helped in finding the root-cause of these not-so healthy endings of a relationship. Here, the symptoms were a partial end of an existing relationship; followed with a half-duplex connection when the client starts a new one. From there on, users where experiencing delays and errors when doing 2 or more actions repeatedly (i.e. without waiting for completion).

These symptoms where then used by the developers for a review of their code on starting and ending relationships. The outcome was that when the client ended a relationship, there were still some leftovers in the session table. As a result, setting up a new connection resulted in a 1-channel, half-duplex relationship (as opposed to 2-channel, full-duplex). Over time the session table was filled with these leftovers. Which in the end, made the app fail and do a re-start.

This brings you to the end of the 3^rd post in which you learned how these not-so healthy endings of a relationship impacts the user experience.

What is covered in the next episode

The upcoming post is mainly about the 6 easy steps discovering the low hanging fruit for getting healthy relationships (again):

How to get healthy relationships (again) (part 4)
How this works for streaming, multimedia type of applications (part 5)

As mentioned earlier, all 5 posts are also available in one convenient PDF; click here to get your copy.