Monday, December 22, 2008
Thoughts on fMRI
(http://www.medgadget.com/archives/2008/12/fmri_extracts_images_from_the_brain.html).
They also discuss the possibilities using it for doing dream analysis. Check out their site at
http://www.cns.atr.jp/indexE.html. Still researching and learning, I thought I would examine packages for the R language. The fmri package in R is a library of functions for performing fMRI analysis based on Tabelow, et. al. (2006). The web site for their work is at
(http://www.wias-berlin.de/people/tabelow/doc/tsb2008.pdf).
The reference manual can be found at (http://cran.r-project.org/web/packages/fmri/fmri.pdf). The package has routines for Non-Gaussian Component Analysis based on Blanchard et. al. (2005) and Independent Component Analysis. Also, the expected BOLD response can be created from task indicator functions as show in the manual. A good paper on ICA by Bai, Shen and Truong (2006) is at (
http://www.samsi.info/200607/ranmat/workinggroup/rc/ranmat-rc/Truong_main.pdf). They use the R package called AnalyzeFMRI. The manual is at
http://cran.r-project.org/web/packages/AnalyzeFMRI/AnalyzeFMRI.pdf
To get started using this packages there are instructions at
https://mri.radiology.uiowa.edu/mediawiki/index.php/FMR_Analysis_in_R
that involves converting an AFNI Image into a single 4D Analyze image and then creating a
mask. AFNI (Analysis of Functional Neuroimages) is an open source application for fMRI data
analysis at http://afni.nimh.nih.gov/afni/. Currently, there are no ports for the application to the Windows environment.
I have done some initial experimentations with both packages using simulation data and
will discuss that more in coming posts. In order to put these packages in perspective, there is a good, but dated article by Gold et.al. (1998) on "Functional MRI Statistical Software Packages: A Comparative Analysis". As far as the teaching aspect, I like this paper by Lai, et. al. "Teaching Statistical Analysis of fMRI Data at http://www.vanth.org/docs/Lai_Gollub_Greenberg_ASEE03.pdf; however, they use Matlab software for their work.
References
Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V. and Müller K.-R. (2005). In Searchof Non-Gaussian Components of a High-Dimensional Distribution. Journal of Machine LearningResearch. pp. 1-48.
Tabelow, K., Polzehl, J., Voss, H.U., and Spokoiny, V.(2006). Analysing fMRI experiments with
structure adaptive smoothing procedures, NeuroImage, 33:55-62.
Thursday, December 18, 2008
Time Series Research for Neuroscience
For example, recent developments in the use of both spatial and time series methods for modeling the measurements from MEG/EEG data have become important. By nature, time series methods are confined to the temporal domain, but in this context they apply to the spatial domain as well. For example, electro-magnetic changes are measures at between 20 to 100 different locations on the brain’s surface at every ten milliseconds. Another example, using functional MRI (fMRI) data which generates over 140,000 dimensional time series every 2-3 seconds, requires spatial-temporal modeling as well. Because of the complexity in both spatial and temporal domains, the neuroscientific community is expected to build models that both describe and understand this complexity. Here is the overlap between building data trading models for the stock market at different frequencies and modeling the brain.
Examples of computational neuroscience software for time series data can be found at (http://home.earthlink.net/~perlewitz/sftwr.html#timeseries) and comes in a wide variety of flavors. However, I like the approach mentioned in my last Blog entry by Kratzig on building user-interface frameworks such as in Java that accommodate implementation of the latest advances in research by using the API of statistical engines. Furthermore, the use of different model ontologies developed in Protégé and Owl, i.e. models, parameters, algorithms, statistics, diagnostics that can be shared would lead to great progress in the area. Ultimately, instead of all possible models being estimated, analyzed and deployed, spatial time series characteristics can suggest different models in the form of an expert system to provide real-time analysis and prediction. Microsoft’s SQL Server 2008 now provides a spatial engine to combine with its temporal data types that can aid in this type of pattern recognition.
Current methods being used such as ICA (Independent Component Analysis) and SPM (Statistical Parametric Mapping) have become popular techniques, but ignore the stochastic dynamics inherent in the time series. Of course, developments in both spatial statistics and the used of improved probability models with both frequency and Bayesian approaches in modeling the higher movements, i.e. conditional means, variances, skewness and kurtosis lead to models with more realistic dynamics that can show meaningful cross-correlations, co-integrations and emergent phenomenon, i.e. chaos theory. Furthermore, the use of wavelet techniques to decompose the time series among different spatial channels provide additional opportunities to gain valuable insight into both the frequencies, harmonics and their correlations in both the spatial and temporal domains.
My current and ongoing research is four-fold
(1) Understanding of the different model types presented in the literature for spatial temporal modeling in both the frequency and Bayesian paradigms available to the neuroscientist
(2) The development and deployment in different languages of the statistical algorithms for (1)
(3) The construction of software, i.e. APIs for different statistical engines that implement both (1) and (2) in the context of solving and describing neuro-scientific problems as mentioned above
(4) Description of ways to move (1)-(3) into expert systems to aid in the accurate diagnosis of pathologies for training fellow neuroscientists
All four parts of this research agenda fits the Boyer model through discovery, teaching, integration, and application. Ultimately, architectures are ontologies as well and there will come a time when these will automatically generate code to solve particular problems without any human intervention. Meanwhile, experimentation is needed to distill these rules...But I digress.
The reference list below is just a small sample of the developing literature in this regard.
References
Galka, A. Yamashita, O. Ozaki, T. (2004) "GARCH modelling of covariance in dynamical estimation of inverse solutions", Physics Letters A, 333, 261-268.
Jimenez,J.C., Ozaki,T.,(2005)"An approximate innovation method for the estimation of diffusion processes from discrete data", J. Time Series Analysis, in press.
Riera, J., Aubert,E., Iwata, K., Kawashima R., Wan,X., Ozaki, T.,(2005)"Fusing EEG and fMRI
based on a bottom-up model: inferring activation and effective connectivity in neural masses"Phul. Trans. of Royal Society, Biological Sciences, vol.360, no.1457, 1025-1041.
Riera, J., Yamashita,O., Kawashima, R., Sadato,N., Okada,T., Ozaki,T.(2004) "fMRI activation maps based on the NN-ARX model", Neuroimage, 23, 680-697.
Wong, K.F., Galka, A., Yamashita, O and, Ozaki, T.,(2005) "Modelling non-stationary variance in EEG time series by state space GARCH model",Computers in Biology and Medicine, in press.
Yamashita,O., Sadato,N., Okada,T. and Ozaki, T.,(2005) "Evaluating frequency-wise directed connectivity of BOLD signals applyinhg relative power contribution with the linear multivariate time series models", Neuroimage, vol.25, 478-490.
Yamashita,O., Galka,A., Ozaki,T., Biscay,R. and Valdes-Sosa,P.(2004) "Recursive least squares solution for dynamic inverse problems of EEG generation", Human Brain Mapping, Vol.21, Issue 4, 221-235.
Tuesday, December 16, 2008
Java and Time Series Analysis


Wednesday, December 10, 2008
Vectors, Likelihoods and Partial Derivatives
GNU Scientific Library Reference Manual - Revised Second Edition (v1.8)by M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, F. RossiPaperback (6"x9"), 636 pages, 60 figures. ISBN 0954161734RRP £24.99 ($39.99)
gsl_vector samples:
http://www.network-theory.co.uk/docs/gslref/Exampleprogramsforvectors.html
gsl_matrix samples:
http://www.network-theory.co.uk/docs/gslref/Exampleprogramsformatrices.html
I want to be able to compute the loglikelihood from Brockwell and Davis (1991) and be able to use different criterions such as AIC, BIC, etc. as shown in my time series books. Basically, it is a minimization on -(log likelihood) that uses the gsl_multimin_function, gsl_multimin_fminimizer
Here is an example :
http://aldebaran.devinci.fr/~cagnol/promotion2007/cs302/gsl/multimin/gsl_multimin.h.html
Futhermore, I have to be able to calculate the partial derivatives of the log-likelihood to obtain
the information matrix for the parameters. For this I can use gsl_deriv_central
An example:
http://www.gnu.org/software/gsl/manual/html_node/Numerical-Differentiation-functions.html
A good example on non-linear least squares can be found at:
http://www.physics.brocku.ca/~tharroun/parratt/group__lstsq.html
and multidimensional minimization
http://www.physics.brocku.ca/~tharroun/gsl_fit/group__mdmin.html
Currently, I am working on some examples for each of these areas to post and making modifications to create a forecastable ARMA model class that I can extend to GARCH and SVM. This will enable me to use additional research algorithms in these components.
References
Time Series: Theory and Methods, second edition (1991) P.J. Brockwell and R.A. Davis, Springer-Verlag, New York.
B. Stroustrup, The C++ Programming Language (3rd Ed), Section 22.4 Vector Arithmetic.
Addison-Wesley 1997, ISBN 0-201-88954-4.
Wednesday, December 3, 2008
GSL 1.11 and .NET VC++ 2008

ReadMe document which I quote here:
"...Settings you have to change when creating your own project :
- additional library directory should point to gsl\lib
His example code for solving a linear equation with matrix decompositions is reproduced here that I got to compile, link and run in VC++ 2008 Express.
#include
#include
////////////////////////////////////////////////////////////
// Solve Ax = b with LU and cholesky
int main(int argc, char **argv)
{
printf("=========== tst2 ===========\n");
double a_data[] = { 2,1,1,3,2,
1,2,2,1,1,
1,2,9,1,5,
3,1,1,7,1,
2,1,5,1,8 };
double b_data[] = { -2,4,3,-5,1 };
gsl_vector *x = gsl_vector_alloc (5);
gsl_permutation * p = gsl_permutation_alloc (5);
gsl_matrix_view m = gsl_matrix_view_array(a_data, 5, 5);
gsl_vector_view b= gsl_vector_view_array(b_data, 5);
int s;
gsl_linalg_LU_decomp (&m.matrix, p, &s);
gsl_linalg_LU_solve (&m.matrix, p, &b.vector, x);
printf ("x = \n");
gsl_vector_fprintf(stdout, x, "%g");
double a2_data[] = { 2,1,1,3,2,
1,2,2,1,1,
1,2,9,1,5,
3,1,1,7,1,
2,1,5,1,8 };
double b2_data[] = { -2,4,3,-5,1 };
gsl_matrix_view m2 = gsl_matrix_view_array(a2_data, 5, 5);
gsl_vector_view b2 = gsl_vector_view_array(b2_data, 5);
gsl_linalg_cholesky_decomp(&m2.matrix);
gsl_linalg_cholesky_solve(&m2.matrix, &b2.vector, x);
printf ("x = \n");
gsl_vector_fprintf(stdout, x, "%g");
gsl_permutation_free (p);
gsl_vector_free(x);
system("pause");
}
.NET Stochastic Volatility Models

http://www.stat.cmu.edu/~abrock/cronos/index.html
I downloaded the software as well as the source for modifying the C++ algorithms based on the open source thread safe GSL-GNU scientific library at
http://www.gnu.org/software/gsl/
for the optimization routines. Of course, I have to ramp-up to be able to implement them in Visual Studio 2008 C++ or C# Express, but should be able to accomplish that this month. Here is an example of the SVM code for the svm.h file:
/*
* -------------------------------------------------------------------
*
* Copyright 2005 Anthony Brockwell
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the Free
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*
* -------------------------------------------------------------------
*/
#ifndef SVM_H
#define SVM_H
// simple stochastic volatility model first
// Y_t \sim N(\mu_Y,\sigma_t^2)
// \sigma_t = \exp(X_t + \mu_X)
// X_t = \phi X_{t-1} + \epsilon_t, {\epsilon_t} \sim IIDN(0,\nu^2)
class SSVModel : public TimeSeriesModel {
protected:
double mean, mux, phi, nu;
public:
SSVModel(); // constructor
int FitModel(TimeSeries *ts, const int method, const int ilimit,
void (*itercallback)(int,void *), void *cb_parms,
ostringstream& msg, ostringstream& supplementalinfo,
bool get_parameter_cov=true);
void SimulateInto(TimeSeries *, int n, bool ovw);
Matrix SimulatePredictive(TimeSeries&, int nsteps, int nsims);
void RenderInto(ostringstream&);
bool RenderCtsVersionInto(ostringstream &output, ostringstream &error, double delta);
void ComputeACFPACF(Vector &acf, Vector *pacf, bool normalizeacf=true);
void Forecast(TimeSeries *, int nsteps, double *fret, double *fmse);
void ComputeStdResiduals(TimeSeries *data, TimeSeries *resids);
double ComputeLogLikelihood(TimeSeries *);
bool CheckModel(bool flipit);
Vector ParameterBundle();
void UnbundleParameters(Vector& v);
Vector ValidityConstraint();
Vector GetMask(); // returns estimation_mask, padded with defaults
Vector GetDefaultParameters(Vector& mask, TimeSeries *tsp); // returns initial guesses for non-held model parameters; other parameters are fixed at current values
void StreamParameterName(ostringstream& strm, int parmnum); // nms[0],... must already be valid allocated strings with length >= 20
Vector ComputeSpectralDensity(Vector& omegas);
double Cdf(double y, double mu, double sig2);
double InvCdf(double u, double mu, double sig2);
};
#endif
I used the Cronos application to estimate a SVM model and some forecasting as shown in Figure 2. Next, step is to show the use of the application and discuss time series analysis in a video.
Figure 2.

To learn more about Stochastic Volatility models, a good place to start is the presentation by Peter Jackel
http://www-stat.wharton.upenn.edu/~steele/Courses/95/ResourceDetails/SV/StochasticVolatilityModels-PastPresentAndFuture.pdf
The next step in this continued research on algorithmic trading is to review the C++ architecture for doing a SVM in the context of the algorithmic trading architecture-so I can finish my paper and discuss some of my observations along the way.
Monday, December 1, 2008
Portfolio Optimization

They permit using their product for research which is my intention here.
They have five lessons on video to use the software: (1) Data Management, (2) Portfolio Construction, (3) Portfolio optimization, (4) Value-at-Risk analysis and (5) Black-Litterman model. For example,
They have an excellent section on features with associated pdfs
on portfolio construction, parameter estimation, portfolio optimization, target probabilities, Value-at-Risk, historical simulations and data management.
Friday, November 28, 2008
Blue Threads For C++
(1) Agent Strategies for Automated Stock Market Trading
(2) Automated Trading Algorithms in C++.Net
(3) FRACNET: The Conditional Simulation of Cascade Models
and reviewing the content at cplusplus.com. I also viewed the seven part series of "Threading in .NET" on the Intel Software Network:
http://software.intel.com/en-us/videos/threading-in-net-best-practices-1-of-7-series
and led to the work by Intel on Threading Building Blocks at
http://www.threadingbuildingblocks.org/
I downloaded the 2.1 version to use with their tutorial and the Reinders (2007) book. In addition, at Developers.net the following case study on risk management and compliance
http://www.developers.net/intelisdshowcase/view/2534
discussed the 64 bit Intel Itanium 2 processors for Monte Calo credit risk modeling. This permits banks to base their credit risk profiles and capital investment strategies on more complex simulations.
Some more good articles:
(1) Primer: Developing Multithreaded Applications
http://www.developers.net/intelisnshowcase/view/544
(2) Developing Multithreaded Applications: A Platform Consistent Approach
http://cache-www.intel.com/cd/00/00/05/15/51534_developing_multithreaded_applications.pdf
(3) Multithreaded Game Programming and Hyper-Threading Technology
http://software.intel.com/en-us/articles/multithreaded-game-programming-and-hyper-threading-technology
Since most of the transformations for trading is done on matrices, we have
(4) Matrix Vector Mulitiplication and Multi-threading Benefits
http://software.intel.com/en-us/articles/matrix-vector-multiplication-and-multi-threading-benefits
(5) Three Methods for Speeding up Matrix-Vector Multiplication
http://www.developers.net/intelisnshowcase/view/152
Finally, we have
(6) Multi-Threading for Experts: Inside a Parallel Application
http://www.developers.net/intelisnshowcase/view/474
References
Reinders, James (2007, July). Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism (Paperback) Sebastopol: O'Reilly Media, ISBN 978-0-596-51480-8.
Monday, November 24, 2008
C++ Programming Channel
Creative Experimentation:Statistical Arbitrage
I was bringing my daughter back from Penn State University yesterday for Thanksgiving break and she had my Pi movie
http://www.pithemovie.com/gifpage.html
that I had not seen in some time and seemed to fit right in with the work this morning. I am reviewing the Statistical Arbitrage work of Ed Thorp published by Wimott Magazine. The first 3 parts are pdfs:
- http://www.wilmott.com/pdfs/080617_thorp.pdf
- http://www.wilmott.com/pdfs/080630_thorp.pdf
- http://www.wilmott.com/pdfs/080709_thorp.pdf
Parts 4-6 are Word documents and can be found in the references at
http://en.wikipedia.org/wiki/Statistical_arbitrage
The articles examines the 70s-80s-90s work in this discipline. Here some of his thoughts and I would encourage to read all six parts of the article.
The Middle Game
Part 1:
Stocks with a large trading volume are called "liquid". They calculate a "fair" price for the largest and most traded stocks on both the New York and American Exchanges. Their strategy is based on their prediction buy the underpriced stocks and short the overpriced stocks. Each stock is 2.5 percent of the long portfolio. They limit the short position to 1.5 percent for each stock. Because of their strategy their results are "postively skewed".
There are about 253 trading days per year.
Part 2:
The author reviews the meaning of arbitrage with hypothetical numerical examples.
First example:
"An example might be sellinggold in London at $300 an ounce while at thesame time buying it at, say $290 in New York.Suppose the total cost to finance the deal and toinsure and deliver the New York gold to Londonis $5, leaving a $5 sure profit. That’s an arbitrage in its original usage." -Ed Thorp
They depend on a large number favorable trades to eventually deliver a profit. The basic
question is "How inefficient is the market?" "How can this be exploited to our advantage?" I agree that questions are more important than answers.
The article provides an overview to the historical answer to these questions.
Part 3:
A discussion on the refrigeration of CPUs and risk reduction. In order to control risk, they segregated into industry groups using factor analysis became the STAR model with 55 industry and 13 macroeconomic factors.
Part 4:
The article makes the important point.
"Note that every stock market system is necessarily limited in the amount of money it can use to produce excess returns. One reason is that buying underpriced securities tends to raise the price, reducing or eliminating the mispricing, and selling short overpriced securities tends to reduce the price, once again reducing or eliminating the mispricing. Thus systems for beating the market are limited in size by the impact of their own trading." -Ed Thorp
Part 5:
A discussion on haggling. I like this quote with the reference to 1/8 dollar a share by
"As President Lyndon Johnson once said about congressional spending, a billion dollars
here, a billion dollars there, and pretty soon you’re talking about some real money." -Ed Thorp
Another right on target quote that dovetails with my reading of Kasparov's latest book
"How Life Imitates Chess" is
"It reminded me of my granddaughter Ava, who when asked at the age of two, “What’s happening?”, replied “Nothing’s happening.” My beautiful ideas were rotting on the vine for lack of follow through. It was clear that if I wanted significant research and development we would have to do it 'in house.'" Back to the Pi movie and looking at every detail of the problem asking questions and leaving bread crumbs.
Part 6:
It is all about time..TIME...TIME... The discussion is on the hedge fund business-a statistical arbitrage hedge fund.
The End Game
The days are passing and ideas are percolating. I think back to Kasparov's book and the following quote:
Play the opening like a book,
the middle game like a magician,
and the end game like a machine.-Rudolf Spielmann
and his comments that, the purpose of the opening isn't to just survive the beginning of the game, it is to set the stage for the type of middle game you want to play.
Thursday, November 20, 2008
Algorithmic Engines for Trading Strategies
As I continue my research this week, the following resources provide insight into the different types of algorithms available.
- http://www.itg.com/
- http://www.itg.com/news_events/papers/AlgoTradingCostsTheTrade2007.pdf
- http://advancedtrading.thewallstreetwiki.com/directories/directory-algorithmic-trading.php
Some Basic Strategies:
- Dark
- Active
- Volume
- Participation
- Volume-weighted average price (VWAP)
- Time-weighted average price (TWAP)
- Arrival Price or Implementation Shortfall (IS)
- Pair
- Short Sell
- Contingent
- http://www.nri.co.jp/english/opinion/papers/2007/pdf/np2007121.pdf
- http://www.capis.com/resources/pdf/Algorithms.pdf
- http://www.cs.tau.ac.il/~mansour/papers/06stoc.pdf
- http://web.unx.com/index.php?option=com_content&task=view&id=44&Itemid=26
- http://www.itg.com/news_events/papers/AlgoSelection20060425.pdf
II. VWAP (volume-weighted average price)
Anatomy of An Automated VWAP Strategy:
http://www.itg.com/news_events/papers/TP_Spring_2002_Madhavan.pdf
Code:
http://www.codeproject.com/KB/recipes/VWAP.aspx
Limit Order Trading:
http://www.cis.upenn.edu/~mkearns/papers/vwap.pdf
III. Conclusion
http://www.advancedtrading.com/showArticle.jhtml?articleID=196900760
Next step is to research current trading architectures.
Tuesday, November 18, 2008
White Label Algorithmic Trading Platforms
http://www.modulusfe.com/tasdk/vcpp.asp
Furthermore, they have a video demonstration of their application with discussion at
http://www.modulusfe.com/m4/
It is worth the investment of time to go through their site and think about a multi-tier object-oriented approach, modules and design patterns to this kind of system as well as an open source purchase of the building blocks for the application. More on this in the next post.
More research on this topic uncovered the following:
Sockets
- http://www.topjimmy.net/tjs/Pages/Development/TJFTP/
- http://www.codeproject.com/KB/cs/ChatApplDotNetSockets.aspx
- http://www.devx.com/dotnet/Article/28083
Multithreading
- http://www.devarticles.com/c/a/Cplusplus/Multithreading-in-C/
- http://www.c-sharpcorner.com/Articles/ArticleListing.aspx?SectionID=1&SubSectionID=149
Generalities
Monday, November 17, 2008
Automated E-Trading Systems in .NET
In addition, this research is on the use of Smart Order Routing/Matching Engine and the architecture of Order Management Systems and review fixed income derivatives and exotics. Thus, the posts this week will use this as a basic motif.
Monday
Smart Order Routing/Matching Engine
- http://www.thetradenews.com/791
- http://streambase.typepad.com/streambase_stream_process/2008/09/smart-order-routing-and-cep.html
- http://complexevents.com/wp-content/uploads/2008/09/streambase_whitepaper_smart_order_routing.pdf
- http://www.futuresindustry.org/downloads/Jul-Aug-Algo.pdf
- http://www.pipelinetrading.com/resources/wst7371_final.pdf
- http://www.tradingtechnologies.com/news/TT_FA05.pdf
FIX
Tuesday
Visual C++ and Trading Systems