go to Elijah Laboratories Inc Home Page
Traffic over Time Traffic by Port Number Traffic by IP Address Traffic by Location

VITO: Visualizing Internet Traffic Online

R. J. Brown -- Elijah Laboratories Inc.

See your network traffic

The VITO project is all about seeing your internet traffic -- where it comes from, where it goes, how it changes over time. As much as possible, it is about visualizing that traffic by means of various graphical representations. The desire is to have near real-time data displays of network traffic. The goal is to permit the network administrator to quickly and easily grasp whether the network is behaving "normally", or "abnormally".

Most people think in terms of images. A person can look at these graphs and easily determine whether the network is behaving much as it usually does, or if it is misbehaving. In the case of misbehaviour, with experience, the network engineer or administrator will recognize patterns in the displays that he has seen before, and thus be able to quickly diagnose the cause of problems.

Rapid prototyping

This project is in its infancy. The present displays are all near real-time except the traffic-by-location display, which is just a mock-up to show what the real thing would look like. The other displays are live, and acurately represent the state and recent history of the network traffic on the Elijah Laboratories LAN (where the workstations and network monitoring system reside) and on the DMZ (where the internet servers reside).

The present displays are implemented via the technique of rapid prototyping. This seems reasonable, considering that we are not sure what we want until we see it and use it for a while to determine its usefullness. Frequently the first attempt at a new display is not the best representation; several iterations of modification and observation are required to get it right.

Because of the rapid prototyping technique of preliminary development, most of the work has been hacked with a bunch of rather ugly shell scripts. Data file formats for intermediate forms of the traffic data were developed in a very ad hoc manner to solve the immediate need of the display being developed. This needs to change.

Traffic by location

Once work began on the traffic-by-location display, which attempts to generate a 3D histogram showing where the packets come from or go to, not in terms of IP address space, but in terms of a world map, the need for a more expressive and maintainable language was made apparent. This display will show where in the world the traffic is coming from and going to. Hacking this as a collection of shell scripts is insane. This development pushed the project into using perl, which is a much more expressive and maintainable language. It should also run faster than shell scripts.

The traffic-by-location displays also made it aparent that something better than ad hoc flat ASCII files was needed for intermediate data storage. The next phase of the work will migrate to the use of PostgreSQL for data storage. This will also provide a persistence that will permit more robust recovery from system crashes, power outages, etc.

Project status

So the project status is that a few displays are up and running live, but they are hacked as traditional shell scripts, using the bash, grep, awk, and the usual assortment of miscellaneous utilities. They use flat ASCII sequential files for intermediate data storage, and they are intensive of CPU resources.

A problem recently arose as a result of the activities of a spammer. The spammer obtained an account on one of the servers located on the DMZ. Although his account was only active for a total of 6 hours, he generated an enormous volume of outgoing emails, many of which bounced. Even after disabling his account and taking the attacked server off-line, the traffic resulting from bounced messages, together with active attacks from systems that received the spam and took vengence on the originating server, resulted in a flow of about 100 packets per second aimed at the now off-line server.

This high volume of traffic caused the packet log files to overflow the 2 GB max file size of the Linux operating system. As a result, the OS of the IDS was changed to FreeBSD, which has a much larger size limit. This caused numerous problems with the scripts that ran the packet logging and graphing package. All these problems had to be corrected to get the software back on-line.

Future development

The near-term project goal is to rewrite the present displays in perl, using the PostgreSQL RDBMS for intermediate storage. This should allow much imporved maintainability, greater sharing of data structures between displays, and more efficient utilization of CPU resources.

In order to implement the traffic-by-location display, it is necessary to translate an IP address into the latitude and longitude of the system that has that IP address. This is no small feat. It requires querying several whois servers and several web sites to determine the city, state, and country where an IP address is located, and then querying several additional web site based gazeteers to look up the latitude and longitude of those cities.

All of this needs to be automated so that it is as easy to use as a typical subroutine. Because it takes a considerable amount of time to perform these network queries over the web, the results of all earlier queries will need to be cached, so that future queries can consult the cache and avoid going out over the web if that query has been made before. There are no standards for interface to any of these web based systems.

It also becomes necessary to maintain packet counts in real-time for every IP address that is in communication with the network being monitored. The requirement to maintain a local cache of cities that have already been looked up, as well as a local cache of IP address to latitude and longitude was what forced the realization that a database would be needed. As long as we will have a database available, it seems like a good idea to keep packet counts by city, state, and country as well.

Software availability

Once all this work results in something somewhat stable and hopefully useful, it will be released under the GPL for use by other network administrators. Any people interested in helping with the development should contact Bob Brown.


go to Elijah Laboratories Inc Home Page
Robert J. Brown
Last modified: Tue Feb 19 16:20:08 CST 2002