data processing

orion · Postby **orion** » 20 Jan 2015

Since the topic of fileAppend() and print() came up here and here, the following guidelines may of interest to MC users:

1. Use fileAppend() only if you must. It is incredibly slow. Use print() if you can since it is 47X faster than fileAppend.

2. There are some interesting use cases for fileAppend:
a) Appending to a file that you do not want to truncate. Say you have trade logs from prior auto trading sessions and you like keeping all trade logs consolidated in one file. In this case, print() will not work since it will truncate the file.
b) Logging to a single file from multiple strategies. Say you have multiple strategies running on one symbol. In this case print() will not work since it will not release the file handle and so only one strategy will be able to write to the file.

3. Use ELC for file writing if you want to turbo charge your file write performance. It is 8X faster than print() and 390X faster than fileAppend().

These numbers are based on benchmarking file write performance using millions of writes using the three methods. Here is tick data file write performance using the three methods. This is for comparative purposes only and it is not recommended that you try writing tick data files using either of these methods since even the ELC method is too slow:

Tick writes per second (tps)
fileAppend (790 tps)
print (37,000 tps)
ELC (310,000 tps)

Now a recommendation to MC management. MC started with a chart-centric model but with the new portfolio trader, which is a fantastic tool, it is clear that the chart-centric model can be improved on. The proposal here is to make MC a little more data-friendly by offering users the capability to peek and poke the database via two new reserved words called importData() and exportData(). Note that the proposal is to make MC more data-friendly and not necessarily data-centric since the majority of users love the visual feedback of charts.

These two reserved words would provide for importing and exporting the data to and from ASCII files emulating the ASCII file export and import available by mouse clicks in the Quote Manager (QM). The mouse click method is just not scalable. It works for users who work with a few symbols as is the case for forex and futures traders but does not work for equities trading where you have a few thousand symbols. The click method of importing and exporting from the database has major scalability limitations for other uses too. For example, the mouse click method of importing and exporting data just does not work for a user who likes to do frequent import and export of tick data to massage tick data in a proprietary way using external tools. Currently, all of this data is locked in the database and there is no scalable way of exposing the data to external tools and then writing the processed data back into the database.

So what do importData() and exportData() look like? Here is a suggested API:

importData(symbol, dataSrc, startDate, endDate, resolution, fileName)
exportData(symbol, dataSrc, startDate, endDate, resolution, fileName)

So exportData("@AD", "TS", 20070101, 20141231, "T", "c:\@AD.csv") would export tick data for the specified time period just like it would have done by a series of mouse clicks in QM. The import and export would be at the native speed available with the mouse clicks in QM which is a lot faster than even ELC. The underlying infrastructure support for importData() and exportData() is already there in MC since these work the same way as they do with the mouse click method. These two reserved words are simply wrappers that expose that underlying infrastructure to interested users.