The Right (and Wrong) Way to Run an Algorithmic Trading Group

I would like to share with you the unique vision that Numerical Method Inc. has about running an algorithmic trading group. To get an edge over competing funds, we emphasize on 1) the research process and 2) technology rather than on hiring more intelligent people.

Currently, the majority of the quant funds run like cheap arcade booths: the traders are given workstations and data. They do whatever to crank out strategies. The only contributing factor to profit is luck – luck in finding the right people and/or luck in finding the right strategies. I had a conversation with an executive from a large financial organization two years ago when they started to build an algorithmic trading group. He said, “Haksun, we need to hire some very smart people to be better than Renaissance.” I repeatedly hear something similar from various portfolio managers and hedge fund owners.

Staffing “very smart” people in this ad-hoc, unmethodical, non-scientific process in search of profitable trading strategy is merely a lottery in disguise.

  1. The main reason is that there is no necessary condition between “very smart” people and “very profitable” trading strategies. If there is any relationship, it can only be a sufficient condition.
  2. It is very difficult to hire the “very smart” people because
    1. They are difficult to be identified among numerous pretentiously smart people.
    2. The competition for them is a very fierce battle. The best examples are the fight between Microsoft and Google over Dr. Lee Kai-Fu, the dispute between Renaissance Technologies and Millennium Partners over two former employees.
  3. The very best people are driven by passion rather than money, e.g., Gates, Jobs. They cannot be hired.

The key to success in running an algorithmic trading group, as in warfare, lies in process (tactics) and technology (weapon), not on star traders. Analogously, knights, despite being elite warriors, were superseded by cheap infantry; German tanks, despite being best engineered during WWII, were outnumbered by cheap USSR tanks. The better traders do not get their profitable strategies from higher beings. Speaking as an AI scientist, they are “better” only because their search involves a bigger space (more knowledge) and is faster (more efficient). We have created a process and a technology that enables a good trader to be just as profitable as the “very smart” traders.

Process

In terms of the algorithmic trading research process, there is usually very little standardization even within one firm, let alone in the industry. Most quant fund houses do not invest in building research technology (with the famous exception of Blackrock).

This is best illustrated by the languages they use to do backtesting. When there are 6 traders, they could be using: Matlab, R, VBA, C++, Java, C#. The first consequence of this using-their-best-language is that there is absolutely no sharing of code. This firm would write the same volatility calculation function 6 times. Suppose trader A comes up with a new way of measuring volatility, trader B could not leverage on this. Trader C could not quickly prototype a new trading idea by combining the mean-reverting signal from trader D and the trend following signal from trader E. The productivity is very low.

The management is not able to compare strategies from two traders. The 6 traders all make different “simplifications” in backtesting. For instance, they would clean the same data set in different way; they would use their “proprietary” bid-ask, execution, (market vs. limit) order models for computing historical P&L; they would make assumptions about liquidity and market impacts. Mainly due to simplifying coding, they would make all sorts of guesses about the details in executing their strategies. The management does not have the time to question every single detail in their backtesting, hence the lack of understanding and confidence. They would simply resort to “trusting” the reports.

Worst, while algorithmic traders maybe excellent mathematicians, they are usually bad programmers. I am yet to see an algorithmic trader who understands modern programming concepts such as interface vs. inheritance, memory model, thread safety, design pattern, testing, and software engineering. They usually produce code that is very long, unstructured and poorly documented. The code must have bugs. Therefore, the management cannot trust their backtesting result and the performance report.

Our solution to make systematic the trading research process has three steps. Firstly, we mandate that all traders do their backtesting in the same language, e.g., Java. Secondly, we mandate that all traders contribute their code to a research library. Thirdly, the firm invests in a common backtesting infrastructure by expanding this research library. The advantages are the following.

  1. The traders can focus on what they are supposedly good at – generating innovative trading strategies. They no longer bother with the IT grunt work.
  2. They can quickly prototype a trading idea by putting together components, e.g., signals, filters, modules, from the research library.
  3. They can share code with colleagues and be understood because all conform to the same standard.
  4. They can compare strategies because the performance measures are computed from simulations making the same assumptions.

Over time, as the algorithmic trading firm invests, creates as well as expands the research library and backtesting system, this IT infrastructure will become the most valuable asset. Suppose a star trader takes 3 month to test a good idea and make it profitable. With this standardized process and technology, a good trader is able to rapidly prototype 30 strategies in 3 days in parallel on a cluster of computers. The profitability of the firm depends not on hiring Einstein but on good and hardworking people leveraging on using the infrastructure.

Technology

There are many vendors that sell backtesting tools. However, IMHO, these backtesters are no more than augmented versions of for-loops – looping over historical data. Some better ones come with various features, e.g., importing data, cleaning data, statistics and signal computation, optimization, real-time paper trading. Some even go one step further to provide brokerage services for real trading. The major problem with all these backtesters is that you cannot code, hence simulate, true quantitative trading strategies with them. Suppose you want to replicate Elliott and Malcolm’s pair trading strategy, you will need to do SDE, EM, MLE, and KF. Most commercial backtesters are built for amateurs and simply cannot do it.

There are a few more professional backtesters that provide link to Matlab or R for data analytics. I have a lot of complains about doing data analysis in Matlab/R. Many traders use them only because they cannot code. (It is extremely rare to find someone who is mathematically talented and can code.) Matlab/R is for amateurs; VBA is a joke. The problems are:

  1. They are very slow. These scripting languages are interpreted line-by-line. They are not built for parallel computing. (OK. I know Matlab/R can do parallel computing but who use them in practice? Which traders understand immutability to write good parallel code?)
  2. They do not handle a lot of data well. How do you handle two year worth of EUR/USD tick by tick data in Matlab/R?
  3. There is no modern software engineering tools built for Matlab/R. How do you know your code is correct? Most traders just think that their code is correct because they are not trained programmers.
  4. The code cannot be debugged easily. Ok. Matlab comes with a toy debugger somewhat better than gdb. It does not compare to NetBeans, Eclipse or IntelliJ IDEA.
  5. How do you debug a large Matlab/R application with 50 pages of scripts anyway? I usually just give up.

You can tell that Matlab/R is not fit for financial modelling simply by observing that no serious person/bank/fund does option pricing in Matlab/R. My former employers do not price our portfolios using Matlab/R on my workstation. They deploy a few thousands computers in a grid to run the pricing code for the whole night! Likewise, we need a cluster of computers to run our trading models for many different scenarios with many different parameters before we are confident to bet a few 100 million on them.

Algo Quant is a first attempt by Numerical Method Inc. to solve the technology problem. Algo Quant is an embodiment of the systematic algorithmic trading research process that we discussed. Algo Quant is a suite of algorithmic trading research tools (with source code) for trading idea development, quick prototyping, data preparation, in-samples calibration, out-samples backtesting, even automatic trading strategy generation. Algo Quant is backed by SuanShu, a powerful modern library of mathematics, statistics and data mining. For more information, please check out this.

If you are a passionate quant/trader/programmer who shares our vision to revolutionize the algorithmic trading industry, please join us! If you are a hedge fund who wants to hire us to implement this scientific trading research process in house, please contact us.

Gain Capital FX rate data

For most non-professional FX traders, one hurdle to start your own algorithmic trading research is tick-by-tick data. Gain Capital Group has been very kind and does the community a very good service by providing historical rate data on their website: http://ratedata.gaincapital.com/. The data look OK.
Unfortunately, one serious problem that catches eyes are the very ad-hoc formats of the data files. For example,
  • The newer csv files store data by weeks (e.g., in year 2010) while some older csv files store data by quarters (e.g., in year 2000).
  • Some zipped files contains folders while other contains csv files.
  • Some zipped files contains other zipped files.
  • Some csv files have headers while others don’t.
  • The orderings of the columns are not all the same for all csv files. Some csv files have even missing columns.
  • The timestamp formats are not all the same.
  • Worst, the csv files have different encoding!
In order to process these “raw” data zipped files into useful and more importantly usable format that we can use for research, I recommend these following steps.
  1. We first unzip the raw zipped files recursively until we store store all plain csv files in one single folder. This is to handle the situation where zipped files in zipped files and folders in folders.
  2. Read each csv file using an appropriate parser (depending on the encoding, format, headers, etc.), row by row, line by line.
  3. Group the rows of the same date together and write them out as one csv file, e.g., AUDCAD.2007-10-7.csv.zip.
  4. Zip the csv file for easy archive.
  5. Write an adapter to read the processed zipped csv files and convert the data into whatever format your backtesting tool reads.
I have written some code to automate these steps, and they are now available in Algo Quant for free. Specifically, the important Java classes are:
  1. GainCapitalFxCsvFilesExtractor.java: unzip all csv files into one folder
  2. GainCapitalFXRawFile.java: read the Gain Capital csv files of various formats, save and zip the data by dates.
  3. GainCapitalFXRawFile.java: read the zipped csv data files.
  4. GainCapitalFXCacheFactory.java: the data feed/adapter to read the zipped csv data files into the Algo Quant backtesting system.
Here is a sample usage:
To unzip the raw zipped files from Gain Capital, we do
    public void test_0010() {
        GainCapitalFxCsvFilesExtracter extracter = new GainCapitalFxCsvFilesExtracter();
        String outputDirName = "./log/tmpGCcsv";
        String tmpDirName = "./log/tmpDir";
        int nCSV = extracter.extract("./test-data/gaincapital/2000", outputDirName,
            tmpDirName);
        assertEquals(18, nCSV);

        // clean up
        NMIO.deleteDir(new File(outputDirName));
        NMIO.deleteDir(new File(tmpDirName));
    }
To extract from the raw files the rows, group, save, and zip them by dates, we do
    public void test_0030() throws IOException {
        String input = "C:/Temp/GCcsv";
        String output = "C:/Temp/byDates";

        File outputDir = new File(output);
        if (!outputDir.canWrite()) {
            outputDir.mkdir();
        }

        File inputDir = new File(input);

        //reading the file names
        File[] csvs = inputDir.listFiles(new FilenameFilter() {

            public boolean accept(File dir, String name) {
                return name.matches(".*[.]csv");
            }
        });

        //these pairs are already processed before, e.g., the program crashed
        Set done = new HashSet();
//        done.add("AUD_USD");
//        done.add("CHF_JPY");

        Set pairs = new HashSet();
        for (File csv : csvs) {
            String pair = csv.getName().substring(0, 7);
            if (pair.matches("..._...")) {
                if (!done.contains(pair)) {
                    pairs.add(pair);
                }
            }
        }

        GainCapitalFxDailyCsvZipFileConverter converter = new
              GainCapitalFxDailyCsvZipFileConverter();
        for (String pair : pairs) {
            String ccy1 = pair.substring(0, 3);
            String ccy2 = pair.substring(4, 7);
            System.out.println(String.format("working on %s/%s", ccy1, ccy2));
            converter.process(
                    ccy1, ccy2,
                    input,
                    String.format("%s/%s%s", output, ccy1, ccy2));
        }
    }
Have fun using the Gain Capital FX data for your own trading research!