Performance - Latency, Throughput etc

We did a latency test of the QWFIX Trading System in different flavors. The regular .Net, regular Java versions and RTSJ version with real time Java.

 

Setup

Software

We run two processes on the same machine. A server process and a client process. There are 5 FIX sessions between client and server, running FIX versions from 4.1 to 5.0SP1. Both processes are built with QWFIX SDK with QWFIX order management API.

Please note in the benchmark test, we measure the latency of the entire systrem, not just the latency of the FIX engine. The latency will be the latency of the FIX engine plus the order management system.

Because it is difficult to synchronize the time sources, we run both client process and server process on the same machine. And we measure the round trip latency on the client side.

Without loss of generality, each time the client will send one "New Order Single" FIX message, to a randomly selected session among all 5 available sessions. When the server receives the "New Order Single" message, it performs all necessary checks such as ClOrdID check (avoid duplicate ClOrdID) and does all book keeps, too. Then the server will generate an acknowledgement "Execution Report" to that specific order and send it back. When the client receives the "Execution Report", it records the latency for that single order and repeating the process by sending another order.

In the test 1 million "New Order Single" messages are sent from client, and in turn 1 million "Execution Report" messages are sent by server. The client records 1 million latency numbers in micro-seconds.

Hardware and OS

The hardware is a workstation with one 2.67GHz Intel Core i7 920 processor, 12GB of memory and 1.5TB of seagate hardware.

The QWFIX.Net is running on a 64 bit Windows Vista Ultimate. The QWFIX_J and QWFIX_RTSJ versions are running on a 64 bit OpenSuse 11.1 linux with realtime kernel.

 

Latency Result for QWFIX_RTSJ

As expected, the QWFIX_RTSJ running in a Sun real time Java VM offers the best latency number.

As shown in the diagram below, the QWFIX RTSJ offers an average latency of 633 microsecond (0.633 millisecond) of round trip latency.

Note each round trip latency will include the 2 FIX message encoding, two FIX message decoding and the business logic handling of order management, plus the network overhead between two processes runing on a single machine.

Jitters - Initial

The first 30 FIX message of each FIX session will have slight higher latency (first message is about 18 ms, and the rest 30 will have about 2ms-3ms round trip latency). It is caused by the JIT (just-in-time compiler) of the virtual machine. That behaviour is observed on both .Net and Java virtual machines. The JIT has to dynamically compile the code whenever the code is executed. It is done dynamically because it is impractical to compile the entire runtime environment at the start time.

RTSJ has a way to pre-record some possible execution pathes and pre-compile everything along the path during the start up.

We have a quite funny work around to send 30 dumy orders at the beginning of the day and cancel them immediately (and it worked very well).

Jitters - 2ms - 4ms Round Trip Latency (1 in 100,000 messages)

 We did periodically observe slightly higher latency during the test. For every message in about 100,0000 messages, we will have a round trip latency of about between 2ms and 4ms.

Extreme Jitters - >4ms - 400ms (1 in a million chance)

There are 4 out of 4 million FIX messages we processed incured much higher latencies. Those latencies happens way out of line, ranging from tens of milliseconds to hundreds of milliseconds. Fortunately, chances for those extreme latency spikes are very rare. Expect one such latency for every 1 million FIX messages.

We suspect the 1 out of a million chance latency spikes are triggered by the file system, which is totally out of anybody's control. We believe such latencies is the result of the system flaw and can not be avoided for now, no matter what programming language you use.

Benchmark_RTSJ_Latency.gif

 

QWFIX .Net

As expected, QWFIX .Net really can't provide any latency guarantee. Currently .Net only comes with regular garbage collector which will periodically "pause the world" to do garbage collection. About 2% of total messages suffer a round trip latency of 2ms - 100+ms.

However, the .Net offers much higher throughput than Java. On the same machine, we observed average latency of about 0.33ms, 60% better than QWFIX_J and QWFIX_RTSJ.

Our QWFIX.Net and QWFIX_J follow exactly the same design. As a matter of fact, we even did more optimization in Java version than .Net version. However, .Net version is still running much faster than Java version, on all platforms.

 

QWFIX_J

QWFIX_J doesn't offer good latency, either. During the test, we did found out the performance is better on Linux than on Windows Vista.

 

Discussion - QWFIX_RTSJ vs. QWFIX_J

The real time Java (RTSJ) is fantastic. It delivers the performance exactly as Sun promised. It is very well suited for extremely high frequency trading.

Our QWFIX_RTSJ has exactly the same design as QWFIX.Net and QWFIX_J. The source code of QWFIX_RTSJ is almost the same as QWFIX_J. The difference is less than 50 lines of code.

Now the question is: Why don't we completely replace QWFIX_J with QWFIX_RTSJ?

Well, the QWFIX_RTSJ does have one drawback, and only one drawback. As of right now Sun only offers 32 bit version of RTSJ. In practice we can only allocate about 3GB of memory for RTSJ enabled processes. We can cache about 6 million FIX messages in about 2GB of memory and leave the rest 1GB for other use. And that is pretty much the upper limit of capacity of QWFIX_RTSJ per process, which translates to about slightly over 1 million orders.

QWFIX_J is optimized for both 32 bit and 64 bit of virtual machine. On 64 bit Java the capacity of QWFIX_J is virtually unlimited.

 

Discussion - QWFIX_RTSJ vs. QWFIX.Net

First of all, they use different technologies, .Net vs. Java.

According to our observation, the .Net virtual machine is generally faster than Java VM, at least for QWFIX products.

However, .Net is only available for Windows platform. Even though Mono has evolved pretty mature on other platforms, the adoption rate for trading applications on Mono is still quite small.

There are many factors need to be considered when it comes to which platform/technology to choose. We are very well prepared to offer all different flavors to our customers.

Neither Windows nor .Net is hard real time system for now. First of all, Windows kernel is just like regular Linux kernels, which is not fully preemptive. The kernel can not guarantee a high priority job will preempt a low priority job at any time. We can make a Linux kernel fully preemptive by applying appropriate kernel patch. However there is no such option on Windows platform.

Does Windows and .Net completely out of chance for high frequency trading? Well, even though Windows is not hard real time OS. It is still very fast and can serve as a soft real time OS very well. In the next year Microsoft will release .Net 4.0, which will feature a new type of GC called "background GC", which is very similar to the real time GC used in RTSJ. It will be interesting to see how QWFIX.Net performs on .Net 4.0.

 

Conclusion

Remember with RTSJ we still have jitters, even though it happens very rarely. We are dealing a fairly complicated system and nothing can be guaranteed. That is the reality. Theoratically, Even a sneeze may affect the vibration of your hard drive and thus affect the transfer rate.

Fortunately, we have a way to test the performance of our system and are quite confident about how it behaves. After all, in trading world we are dealing with much higher uncertainty. We provide enough information to our customers and it is up to them to choose which technology is the best of them.

QWFIX is a general purpose, FIX protocol based trading system. With about 50 lines of code one can manage the entire intra-day order flow.

For information about the design of QWFIX Trading System, please read our design white paper.