Gain Capital FX rate data

Posted on May 1, 2011
By haksunli
In AlgoQuant, Algorithmic Trading
17 comments

For most non-professional FX traders, one hurdle to start your own algorithmic trading research is tick-by-tick data. Gain Capital Group has been very kind and does the community a very good service by providing historical rate data on their website: http://ratedata.gaincapital.com/. The data look OK.

Unfortunately, one serious problem that catches eyes are the very ad-hoc formats of the data files. For example,

The newer csv files store data by weeks (e.g., in year 2010) while some older csv files store data by quarters (e.g., in year 2000).
Some zipped files contains folders while other contains csv files.
Some zipped files contains other zipped files.
Some csv files have headers while others don’t.
The orderings of the columns are not all the same for all csv files. Some csv files have even missing columns.
The timestamp formats are not all the same.
Worst, the csv files have different encoding!

In order to process these “raw” data zipped files into useful and more importantly usable format that we can use for research, I recommend these following steps.

We first unzip the raw zipped files recursively until we store store all plain csv files in one single folder. This is to handle the situation where zipped files in zipped files and folders in folders.
Read each csv file using an appropriate parser (depending on the encoding, format, headers, etc.), row by row, line by line.
Group the rows of the same date together and write them out as one csv file, e.g., AUDCAD.2007-10-7.csv.zip.
Zip the csv file for easy archive.
Write an adapter to read the processed zipped csv files and convert the data into whatever format your backtesting tool reads.

I have written some code to automate these steps, and they are now available in Algo Quant for free. Specifically, the important Java classes are:

GainCapitalFxCsvFilesExtractor.java: unzip all csv files into one folder
GainCapitalFXRawFile.java: read the Gain Capital csv files of various formats, save and zip the data by dates.
GainCapitalFXRawFile.java: read the zipped csv data files.
GainCapitalFXCacheFactory.java: the data feed/adapter to read the zipped csv data files into the Algo Quant backtesting system.

Here is a sample usage:

To unzip the raw zipped files from Gain Capital, we do

(c.f., source code here)

    public void test_0010() {
        GainCapitalFxCsvFilesExtracter extracter = new GainCapitalFxCsvFilesExtracter();
        String outputDirName = "./log/tmpGCcsv";
        String tmpDirName = "./log/tmpDir";
        int nCSV = extracter.extract("./test-data/gaincapital/2000", outputDirName,
            tmpDirName);
        assertEquals(18, nCSV);

        // clean up
        NMIO.deleteDir(new File(outputDirName));
        NMIO.deleteDir(new File(tmpDirName));
    }

To extract from the raw files the rows, group, save, and zip them by dates, we do

(c.f., source code here)

    public void test_0030() throws IOException {
        String input = "C:/Temp/GCcsv";
        String output = "C:/Temp/byDates";

        File outputDir = new File(output);
        if (!outputDir.canWrite()) {
            outputDir.mkdir();
        }

        File inputDir = new File(input);

        //reading the file names
        File[] csvs = inputDir.listFiles(new FilenameFilter() {

            public boolean accept(File dir, String name) {
                return name.matches(".*[.]csv");
            }
        });

        //these pairs are already processed before, e.g., the program crashed
        Set done = new HashSet();
//        done.add("AUD_USD");
//        done.add("CHF_JPY");

        Set pairs = new HashSet();
        for (File csv : csvs) {
            String pair = csv.getName().substring(0, 7);
            if (pair.matches("..._...")) {
                if (!done.contains(pair)) {
                    pairs.add(pair);
                }
            }
        }

        GainCapitalFxDailyCsvZipFileConverter converter = new
              GainCapitalFxDailyCsvZipFileConverter();
        for (String pair : pairs) {
            String ccy1 = pair.substring(0, 3);
            String ccy2 = pair.substring(4, 7);
            System.out.println(String.format("working on %s/%s", ccy1, ccy2));
            converter.process(
                    ccy1, ccy2,
                    input,
                    String.format("%s/%s%s", output, ccy1, ccy2));
        }
    }

Have fun using the Gain Capital FX data for your own trading research!

haksunli

17 Comments

junior trader
May 3, 2011 at 3:42 pm
This is extremely useful for my own research. I have been looking for this for long time. Thank you!
- Log in to Reply
- Link
gigi
May 29, 2011 at 10:15 am
I’ve done almost the same thing (but in Python – much easier). Instead of writing back to CSV files, I wrote to binary files, which are much faster to read back, since you don’t have to parse the date again.
- Log in to Reply
- Link
Haksun LiModerator
May 31, 2011 at 2:55 am
Would you be able to share your Python scripts please?

By the way, would you like to read your data for, e.g., examination and verification, in csv rather than binary files?
- Log in to Reply
- Link
M L
September 9, 2011 at 7:26 pm
This was very useful indeed.

For those like me coming from the .Net World, here is a step-by-step guide :

1 – Don’t bother trying to compile the classes presented on this page in a standalone java project. It won’t work. You NEED to Download Algoquant directly (See link to the software given in the above article), then Download Netbeans (a Java IDE – I tried to use eclipse, my preferred java IDE, but had a hard time configuring all the jars libs, packages, and stuff…)
There is a nbproject folder that will automatically be recognized by NetBeans when you open the algoquant folder (in “open project”) and you’ll have the right structure with everything compiling from the start

2 – Keep the folder structure as is. Yes, you need that publicmodels folder too, in the same level as the algoquant folder, in order for the stuff to compile.

3 – Download a trial license (this is a bit annoying, but fair enough). You have to register to the forum (I hope you did not skip math class in junior high 😛 ),
and then navigate to Numerical Method Forum > General Category > General Discussion > Numerical Method Trial License (Otherwise you might just end up to an error page). copy the numericalmethod.lic file to the algoquant folder and replace the older one (otherwise when you try to execute stuff you will have an outdated/invalid license error)

4 – Go to com.numericalmethod.algoquant.data.historicaldata.gaincapital in the NetBeans project Navigatin Pane, open the test classes you want,

change folder names in the test methods you want to run, mark them static,
and create amain method to call them, ie :
public static void main(String[] args) {
test_xxxx();
}

5 – RIght Click on the test file you need and click “Run File” (Or just click shit+F6)
Voilà

6 – Note :
step A is GainCapitalFxCsvFilesExtractorTest
add :
public static void main(String[] args) {
test_1000();
}

step B is GainCapitalFxDailyCsvZipFileConverterTest
add :
public static void main(String[] args) {
try{ test_0030();}
catch(Exception ex){
}
}

The “Week1.2.csv” naming is VERY unhandy (after step A), so just go to step B and watch the magic after pointing to a fresh created clean data directory (output string for test_0030() ) : Data is grouped by FX Pair Folders, each one containing one file per day. Hurray !

Note that Right side zeros are skipped in the output format (ie 14.537000 turns to 14.53) and the timestamp is a bit odd (2007-01-02 00:02:24:0000 America/New_York) but you can change this easily.

Conclusion :
This tool is very precious and is a perfect starter that you can later customize to fit your needs, while saving a lot of time so you can focus you precious fingers for coding something more value-adding than data preparation.
I might suggest the use of binary format rather that zip format as it turns to be really quicker in the backtesting process.

Peace.

ML – AMC
- Log in to Reply
- Link
Quant Trader
September 10, 2011 at 12:29 am
ML’s step-by-step guide is very helpful! Thank you for writing it up.
Let’s push them to get the binary format to make the process faster.

Note: they recently got rid of “publicmodels” so there needs only one folder (“algoquant”).
- Log in to Reply
- Link
Haksun LiModerator
September 10, 2011 at 4:29 am
ML,

Thank you very much for reporting a problem with AlgoQuant on processing Gain Capital data.

It appears that Gain Capital has recently changed its timestamp for the sub-second part from millisecond (3 digits) to nanosecond (9 digits). AlgoQuant version 0.0.5 and before will fail to process the most recent raw data file.

For instance, this new format will fail:
lTid,cDealable,CurrencyPair,RateDateTime,RateBid,RateAsk
0000000000,D,EUR/USD,2011-08-28 17:00:10.727000000,1.1,1.2
0000000001,D,EUR/USD,2011-08-28 17:00:11.073000000,1.3,1.4

0000000002,D,EUR/USD,2011-08-28 17:01:17.963000000,1.5,1.6
0000000003,D,EUR/USD,2011-08-28 17:01:20.027000000,1.7,1.8

(Note the nanosecond representation in the data; it was millisecond).

Based on your feedback, we have modified AlgoQuant to account for the new timestamp format.

The new code can be found here:
http://redmine.numericalmethod.com/projects/public/repository/svn-algoquant/entry/core/src/main/java/com/numericalmethod/algoquant/data/historicaldata/gaincapital/preprocess/GainCapitalFXDailyCsvZipFileConverter.java

We now also throw an exception when there is an unrecognized (new) timestamp format in case Gain Capital changes the format again.

A testing case can be found here:
http://redmine.numericalmethod.com/projects/public/repository/svn-algoquant/entry/core/src/test/java/com/numericalmethod/algoquant/data/historicaldata/gaincapital/preprocess/GainCapitalFXDailyCsvZipFileConverterTest.java
(test_0030)

Hope this helps.

Thanks,
Haksun
- Log in to Reply
- Link
Haksun LiModerator
September 12, 2011 at 10:44 pm
Running AlgoQuant in Eclipse is quite simple:
http://numericalmethod.com/forums/topic/how-to-use-suanshu-library-in-eclipse-ide-java/
- Log in to Reply
- Link
Quant Trader
November 1, 2011 at 12:52 pm
It looks like Gain Capital has recently added sub-seconds to the time stamps in the FX rate data.
- Log in to Reply
- Link
nuke
January 9, 2012 at 3:40 am
can any1 please explain the ltid column
what time is 1379074668 how do i parse it?
- Log in to Reply
- Link
Quant Trader
January 9, 2012 at 4:36 pm
“ltid” is probably just their internal identifier. It is not used in the data. AlgoQuant parses them as “long”.

See this for reference:
http://redmine.numericalmethod.com/projects/public/repository/svn-algoquant/entry/core/src/test/java/com/numericalmethod/algoquant/data/historicaldata/gaincapital/preprocess/GainCapitalFXRawFileTest.java
- Log in to Reply
- Link
Alex
December 6, 2012 at 1:14 am
Nice Haksun —

Thanks for the down and dirty algo method.

Special thanks to JL for the .NET clarification– 🙂

Thanks and regards,
Alex N
- Log in to Reply
- Link
Qitian
July 10, 2013 at 11:13 pm
Hi Prof Li, I was in your class conducted in NTU, Apr 2012. I am at the moment big fan of python and it is very easy to read Gain Capital data. Code snippet,

# import modules
import pandas as pd
from datetime import datetime

# define a parser, for the more recent timestamp
def gain_parser(dt_str):
try:
return datetime.strptime(dt_str[:-3],’%Y-%m-%d %H:%M:%S.%f’)
except Exception, e:
return datetime.strptime(dt_str,’%Y-%m-%d %H:%M:%S’)

# parse the data
usdjpy = pd.read_csv(‘USD_JPY_Week1 2.csv’,index_col=3,date_parser=gain_parser)

# preview data
print usdjpy.head()

———————————————————
lTid cDealable CurrencyPair RateBid RateAsk
RateDateTime
2013-03-03 17:00:08.673000 2632942494 D USD/JPY 93.634 93.654
2013-03-03 17:00:09.017000 2632942497 D USD/JPY 93.636 93.656
2013-03-03 17:00:09.360000 2632942498 D USD/JPY 93.634 93.654
2013-03-03 17:00:09.707000 2632942503 D USD/JPY 93.632 93.652
2013-03-03 17:00:10.393000 2632942516 D USD/JPY 93.628 93.648
- Log in to Reply
- Link
Haksun Li
July 12, 2013 at 1:54 am
Thank you, Qitian, for the information.
- Log in to Reply
- Link
Moon
August 5, 2013 at 9:15 pm
Haksun,

Can you tell me what timestamp GAINDATA has? Is it NY time? And if it is does it include DayLightSavings? I am trying to convert GAINDATA timestamps to UTC.
- Log in to Reply
- Link
Haksun LiModerator
August 6, 2013 at 3:23 am
They are NY time with daylight savings, I believe.
- Log in to Reply
- Link
Kingsley
May 1, 2015 at 1:22 am
Haksun,

thanks for the useful code to parse these data files.
Do you know if the sell/buy prices they provide are the indicative or executable prices?
- Log in to Reply
- Link
Haksun LiModerator
May 1, 2015 at 5:52 pm
Unfortunately, I do not know the source of their data.
- Log in to Reply
- Link