Monday, December 22, 2008

Thoughts on fMRI

Researchers at the ATR Computational Neuroscience Lab in Japan have made some interesting progress using fMRI for visual letter recognition and has me thinking about the field in more depth

(http://www.medgadget.com/archives/2008/12/fmri_extracts_images_from_the_brain.html).
They also discuss the possibilities using it for doing dream analysis. Check out their site at
http://www.cns.atr.jp/indexE.html. Still researching and learning, I thought I would examine packages for the R language. The fmri package in R is a library of functions for performing fMRI analysis based on Tabelow, et. al. (2006). The web site for their work is at

(http://www.wias-berlin.de/people/tabelow/doc/tsb2008.pdf).

The reference manual can be found at (http://cran.r-project.org/web/packages/fmri/fmri.pdf). The package has routines for Non-Gaussian Component Analysis based on Blanchard et. al. (2005) and Independent Component Analysis. Also, the expected BOLD response can be created from task indicator functions as show in the manual. A good paper on ICA by Bai, Shen and Truong (2006) is at (
http://www.samsi.info/200607/ranmat/workinggroup/rc/ranmat-rc/Truong_main.pdf). They use the R package called AnalyzeFMRI. The manual is at

http://cran.r-project.org/web/packages/AnalyzeFMRI/AnalyzeFMRI.pdf

To get started using this packages there are instructions at

https://mri.radiology.uiowa.edu/mediawiki/index.php/FMR_Analysis_in_R

that involves converting an AFNI Image into a single 4D Analyze image and then creating a
mask. AFNI (Analysis of Functional Neuroimages) is an open source application for fMRI data
analysis at http://afni.nimh.nih.gov/afni/. Currently, there are no ports for the application to the Windows environment.

I have done some initial experimentations with both packages using simulation data and
will discuss that more in coming posts. In order to put these packages in perspective, there is a good, but dated article by Gold et.al. (1998) on "Functional MRI Statistical Software Packages: A Comparative Analysis". As far as the teaching aspect, I like this paper by Lai, et. al. "Teaching Statistical Analysis of fMRI Data at http://www.vanth.org/docs/Lai_Gollub_Greenberg_ASEE03.pdf; however, they use Matlab software for their work.

References

Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V. and Müller K.-R. (2005). In Searchof Non-Gaussian Components of a High-Dimensional Distribution. Journal of Machine LearningResearch. pp. 1-48.

Tabelow, K., Polzehl, J., Voss, H.U., and Spokoiny, V.(2006). Analysing fMRI experiments with
structure adaptive smoothing procedures, NeuroImage, 33:55-62.

Thursday, December 18, 2008

Time Series Research for Neuroscience

In working on my research agenda and research grants for UOP, I return to my thoughts while at WVU in the development of spatial-temporal models for neuroscientific research. So far, the work on this blog has examined algorithmic trading with time series methods and these techniques can easily be extending to modeling electrical brain activity. Thus, one does not shift gears by looking at the human brain as an investment portfolio that contains different financial instruments sold in different spatial markets. Real-time is "real-time" and thus one architecture can easily apply to the other if you understand the essence, i.e. a Plato reference.

For example, recent developments in the use of both spatial and time series methods for modeling the measurements from MEG/EEG data have become important. By nature, time series methods are confined to the temporal domain, but in this context they apply to the spatial domain as well. For example, electro-magnetic changes are measures at between 20 to 100 different locations on the brain’s surface at every ten milliseconds. Another example, using functional MRI (fMRI) data which generates over 140,000 dimensional time series every 2-3 seconds, requires spatial-temporal modeling as well. Because of the complexity in both spatial and temporal domains, the neuroscientific community is expected to build models that both describe and understand this complexity. Here is the overlap between building data trading models for the stock market at different frequencies and modeling the brain.

Examples of computational neuroscience software for time series data can be found at (http://home.earthlink.net/~perlewitz/sftwr.html#timeseries) and comes in a wide variety of flavors. However, I like the approach mentioned in my last Blog entry by Kratzig on building user-interface frameworks such as in Java that accommodate implementation of the latest advances in research by using the API of statistical engines. Furthermore, the use of different model ontologies developed in Protégé and Owl, i.e. models, parameters, algorithms, statistics, diagnostics that can be shared would lead to great progress in the area. Ultimately, instead of all possible models being estimated, analyzed and deployed, spatial time series characteristics can suggest different models in the form of an expert system to provide real-time analysis and prediction. Microsoft’s SQL Server 2008 now provides a spatial engine to combine with its temporal data types that can aid in this type of pattern recognition.

Current methods being used such as ICA (Independent Component Analysis) and SPM (Statistical Parametric Mapping) have become popular techniques, but ignore the stochastic dynamics inherent in the time series. Of course, developments in both spatial statistics and the used of improved probability models with both frequency and Bayesian approaches in modeling the higher movements, i.e. conditional means, variances, skewness and kurtosis lead to models with more realistic dynamics that can show meaningful cross-correlations, co-integrations and emergent phenomenon, i.e. chaos theory. Furthermore, the use of wavelet techniques to decompose the time series among different spatial channels provide additional opportunities to gain valuable insight into both the frequencies, harmonics and their correlations in both the spatial and temporal domains.

My current and ongoing research is four-fold

(1) Understanding of the different model types presented in the literature for spatial temporal modeling in both the frequency and Bayesian paradigms available to the neuroscientist

(2) The development and deployment in different languages of the statistical algorithms for (1)

(3) The construction of software, i.e. APIs for different statistical engines that implement both (1) and (2) in the context of solving and describing neuro-scientific problems as mentioned above

(4) Description of ways to move (1)-(3) into expert systems to aid in the accurate diagnosis of pathologies for training fellow neuroscientists

All four parts of this research agenda fits the Boyer model through discovery, teaching, integration, and application. Ultimately, architectures are ontologies as well and there will come a time when these will automatically generate code to solve particular problems without any human intervention. Meanwhile, experimentation is needed to distill these rules...But I digress.

The reference list below is just a small sample of the developing literature in this regard.

References

Galka, A. Yamashita, O. Ozaki, T. (2004) "GARCH modelling of covariance in dynamical estimation of inverse solutions", Physics Letters A, 333, 261-268.

Jimenez,J.C., Ozaki,T.,(2005)"An approximate innovation method for the estimation of diffusion processes from discrete data", J. Time Series Analysis, in press.

Riera, J., Aubert,E., Iwata, K., Kawashima R., Wan,X., Ozaki, T.,(2005)"Fusing EEG and fMRI
based on a bottom-up model: inferring activation and effective connectivity in neural masses"Phul. Trans. of Royal Society, Biological Sciences, vol.360, no.1457, 1025-1041.

Riera, J., Yamashita,O., Kawashima, R., Sadato,N., Okada,T., Ozaki,T.(2004) "fMRI activation maps based on the NN-ARX model", Neuroimage, 23, 680-697.

Wong, K.F., Galka, A., Yamashita, O and, Ozaki, T.,(2005) "Modelling non-stationary variance in EEG time series by state space GARCH model",Computers in Biology and Medicine, in press.

Yamashita,O., Sadato,N., Okada,T. and Ozaki, T.,(2005) "Evaluating frequency-wise directed connectivity of BOLD signals applyinhg relative power contribution with the linear multivariate time series models", Neuroimage, vol.25, 478-490.

Yamashita,O., Galka,A., Ozaki,T., Biscay,R. and Valdes-Sosa,P.(2004) "Recursive least squares solution for dynamic inverse problems of EEG generation", Human Brain Mapping, Vol.21, Issue 4, 221-235.

Tuesday, December 16, 2008

Java and Time Series Analysis





In the course of my exploration of numerical engines to perform univariate and multivariate time series analysis, there is a dissertation done and series of papers by Dr. Markus Kratzig at the Insitute of Statistics and Econometrics on using a Java based application framework to interface with mathematical and statistical engines such as Gauss, Ox and R. His free software can be obtained at



and provides a very intuitive and easy to use extensible application for doing multiple time series analysis with ARIMA, ARCH/GARCH, VARS and ECMs models.


Furthermore, by using the Java classes with and an IDE like Eclipse, a graphical user interface framework is easy to use for building your own application around other statistical engines. I had the same idea for using .NET, but his API is far more developed than mine. Using GAUSS/MatLab, one can compile their routines and easily integrate them into the environment. Of course, it does not have to be a commerical application, but one could use R, the matrix package JAMA, or GSL.


There is a lot of value here in this work and for scholars learning about the application of time series methods, computer science and programming models, and numerical algorithms for cutting edge research. Personally, I am going through the rJava/JRI code to use R with Java. In previous posts, I have written about how to use R with C#.NET and would like to be able to do the same with Java within this framework. There no since in reinventing the wheel and it is easier to think in terms of collages then be a purist armed with only a single language. The author provides easy to follow examples in his paper and website for such work and again thereis a lot of valuable information to be gained in the exercise. Especially, the Help system for JMulti which provides indepth documentation on models, diagnostics, etc. and is recommended to examine as well.



In reading journal articles, everyone has their preferences and well as intellectual investments, so I think it is important to be flexible, open and use what's available to achieve the end result.

Wednesday, December 10, 2008

Vectors, Likelihoods and Partial Derivatives

The basic architecture for the open source application Cronos contains TimeSeries, TimeSeriesModel and ARMAModel classes to get started. These classes, i.e. 46 pages, require an understanding of linear algebra, minimization and partial differentiation from the gsl perspective. The current matrix and vector classes supplied would not compile so in an attempt to have a better understanding, I am rewriting the code in a reduced version using just gsl_vector and gsl_matrix for .NET. References can be found in this book

GNU Scientific Library Reference Manual - Revised Second Edition (v1.8)by M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, M. Booth, F. RossiPaperback (6"x9"), 636 pages, 60 figures. ISBN 0954161734RRP £24.99 ($39.99)

gsl_vector samples:
http://www.network-theory.co.uk/docs/gslref/Exampleprogramsforvectors.html

gsl_matrix samples:
http://www.network-theory.co.uk/docs/gslref/Exampleprogramsformatrices.html

I want to be able to compute the loglikelihood from Brockwell and Davis (1991) and be able to use different criterions such as AIC, BIC, etc. as shown in my time series books. Basically, it is a minimization on -(log likelihood) that uses the gsl_multimin_function, gsl_multimin_fminimizer

Here is an example :
http://aldebaran.devinci.fr/~cagnol/promotion2007/cs302/gsl/multimin/gsl_multimin.h.html

Futhermore, I have to be able to calculate the partial derivatives of the log-likelihood to obtain
the information matrix for the parameters. For this I can use gsl_deriv_central

An example:
http://www.gnu.org/software/gsl/manual/html_node/Numerical-Differentiation-functions.html

A good example on non-linear least squares can be found at:

http://www.physics.brocku.ca/~tharroun/parratt/group__lstsq.html

and multidimensional minimization

http://www.physics.brocku.ca/~tharroun/gsl_fit/group__mdmin.html

Currently, I am working on some examples for each of these areas to post and making modifications to create a forecastable ARMA model class that I can extend to GARCH and SVM. This will enable me to use additional research algorithms in these components.

References

Time Series: Theory and Methods, second edition (1991) P.J. Brockwell and R.A. Davis, Springer-Verlag, New York.

B. Stroustrup, The C++ Programming Language (3rd Ed), Section 22.4 Vector Arithmetic.
Addison-Wesley 1997, ISBN 0-201-88954-4.

Wednesday, December 3, 2008

GSL 1.11 and .NET VC++ 2008



Well, I could not sleep tonight with thoughts of France on my mind and fighting flu symptoms so I thought I would run an example Visual Studio 2008 C++ with GSL. After some trial and error, I went to the site of David Geldreich


and downloaded his 1.11 version binaries for Windows and made the changes suggested by his
ReadMe document which I quote here:

"...Settings you have to change when creating your own project :

- additional include directory should point to gsl\include
- additional library directory should point to gsl\lib
- Code generation : use /MDd for Debug and /MD for Release
- Debug config should link with "cblas_d.lib gsl_d.lib"
- Release config should link with "cblas.lib gsl.lib"..."

His example code for solving a linear equation with matrix decompositions is reproduced here that I got to compile, link and run in VC++ 2008 Express.

#include
#include
////////////////////////////////////////////////////////////
// Solve Ax = b with LU and cholesky
int main(int argc, char **argv)
{
printf("=========== tst2 ===========\n");
double a_data[] = { 2,1,1,3,2,
1,2,2,1,1,
1,2,9,1,5,
3,1,1,7,1,
2,1,5,1,8 };
double b_data[] = { -2,4,3,-5,1 };
gsl_vector *x = gsl_vector_alloc (5);
gsl_permutation * p = gsl_permutation_alloc (5);
gsl_matrix_view m = gsl_matrix_view_array(a_data, 5, 5);
gsl_vector_view b= gsl_vector_view_array(b_data, 5);
int s;
gsl_linalg_LU_decomp (&m.matrix, p, &s);
gsl_linalg_LU_solve (&m.matrix, p, &b.vector, x);
printf ("x = \n");
gsl_vector_fprintf(stdout, x, "%g");
double a2_data[] = { 2,1,1,3,2,
1,2,2,1,1,
1,2,9,1,5,
3,1,1,7,1,
2,1,5,1,8 };
double b2_data[] = { -2,4,3,-5,1 };
gsl_matrix_view m2 = gsl_matrix_view_array(a2_data, 5, 5);
gsl_vector_view b2 = gsl_vector_view_array(b2_data, 5);
gsl_linalg_cholesky_decomp(&m2.matrix);
gsl_linalg_cholesky_solve(&m2.matrix, &b2.vector, x);
printf ("x = \n");
gsl_vector_fprintf(stdout, x, "%g");
gsl_permutation_free (p);
gsl_vector_free(x);
system("pause");
}

Now that I have a simple gsl example working, I can look at the stochastic volatility architecture and source code presented by Anthony Brockwell and write a prototype. Thanks David for making these binaries and code available for download. Success always makes one feel a little better.

.NET Stochastic Volatility Models


Based on some conversations with others on the use of 3rd party products, I have been putting the materials together for building C++.NET or C#.NET algorithms for Stochastic Volatility models (SVM) for my paper. In the course of my research, I came across the following open source application at Carnegie-Mellon for doing time series analysis.

http://www.stat.cmu.edu/~abrock/cronos/index.html

I downloaded the software as well as the source for modifying the C++ algorithms based on the open source thread safe GSL-GNU scientific library at

http://www.gnu.org/software/gsl/

for the optimization routines. Of course, I have to ramp-up to be able to implement them in Visual Studio 2008 C++ or C# Express, but should be able to accomplish that this month. Here is an example of the SVM code for the svm.h file:


/*
* -------------------------------------------------------------------
*
* Copyright 2005 Anthony Brockwell
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Library General Public License for more details.
*
* You should have received a copy of the GNU Library General Public
* License along with this library; if not, write to the Free
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*
* -------------------------------------------------------------------
*/
#ifndef SVM_H
#define SVM_H
// simple stochastic volatility model first
// Y_t \sim N(\mu_Y,\sigma_t^2)
// \sigma_t = \exp(X_t + \mu_X)
// X_t = \phi X_{t-1} + \epsilon_t, {\epsilon_t} \sim IIDN(0,\nu^2)
class SSVModel : public TimeSeriesModel {
protected:
double mean, mux, phi, nu;

public:
SSVModel(); // constructor
int FitModel(TimeSeries *ts, const int method, const int ilimit,
void (*itercallback)(int,void *), void *cb_parms,
ostringstream& msg, ostringstream& supplementalinfo,
bool get_parameter_cov=true);
void SimulateInto(TimeSeries *, int n, bool ovw);
Matrix SimulatePredictive(TimeSeries&, int nsteps, int nsims);
void RenderInto(ostringstream&);
bool RenderCtsVersionInto(ostringstream &output, ostringstream &error, double delta);
void ComputeACFPACF(Vector &acf, Vector *pacf, bool normalizeacf=true);
void Forecast(TimeSeries *, int nsteps, double *fret, double *fmse);
void ComputeStdResiduals(TimeSeries *data, TimeSeries *resids);
double ComputeLogLikelihood(TimeSeries *);
bool CheckModel(bool flipit);
Vector ParameterBundle();
void UnbundleParameters(Vector& v);
Vector ValidityConstraint();
Vector GetMask(); // returns estimation_mask, padded with defaults
Vector GetDefaultParameters(Vector& mask, TimeSeries *tsp); // returns initial guesses for non-held model parameters; other parameters are fixed at current values
void StreamParameterName(ostringstream& strm, int parmnum); // nms[0],... must already be valid allocated strings with length >= 20
Vector ComputeSpectralDensity(Vector& omegas);
double Cdf(double y, double mu, double sig2);
double InvCdf(double u, double mu, double sig2);
};
#endif


I used the Cronos application to estimate a SVM model and some forecasting as shown in Figure 2. Next, step is to show the use of the application and discuss time series analysis in a video.

Figure 2.


To learn more about Stochastic Volatility models, a good place to start is the presentation by Peter Jackel

http://www-stat.wharton.upenn.edu/~steele/Courses/95/ResourceDetails/SV/StochasticVolatilityModels-PastPresentAndFuture.pdf

The next step in this continued research on algorithmic trading is to review the C++ architecture for doing a SVM in the context of the algorithmic trading architecture-so I can finish my paper and discuss some of my observations along the way.

Monday, December 1, 2008

Portfolio Optimization


I wanted to touch base on asset allocation, portfolio optimization and risk management and looked at the product and literature of



They permit using their product for research which is my intention here.



They have five lessons on video to use the software: (1) Data Management, (2) Portfolio Construction, (3) Portfolio optimization, (4) Value-at-Risk analysis and (5) Black-Litterman model. For example,



They have an excellent section on features with associated pdfs



on portfolio construction, parameter estimation, portfolio optimization, target probabilities, Value-at-Risk, historical simulations and data management.

An excellent review of multi-period asset allocation for portfolio theory can be obtained at