Friday, January 30, 2009
Genetic Algorithms:Loss Functions
error += Math.Abs( y - data[i + windowSize] )
The entire method is presented below:
public double Evaluate( IChromosome chromosome )
{
// get function in polish notation
string function = chromosome.ToString( );
// go through all the data
double error = 0.0;
for ( int i = 0, n = data.Length - windowSize - predictionSize; i < n; i++ )
{
// put values from current window as variables
for ( int j = 0, b = i + windowSize - 1; j < windowSize; j++ )
{
variables[j] = data[b - j];
}
// avoid evaluation errors
try
{
// evaluate the function
double y = PolishExpression.Evaluate( function, variables );
// check for correct numeric value
if ( double.IsNaN( y ) )
return 0;
// get the difference between evaluated value and
// next value after the window, and sum error
error += Math.Abs( y - data[i + windowSize] );
}
catch
{
return 0;
}
}
// return optimization function value
return 100.0 / ( error + 1 );
}
In a presentation by Wang and Zhang on "Risk and Loss Functions" at http://www.ee.columbia.edu/~dpwe/mlsp/yong-chap3.pdf , the authors review squared error Loss, Huber's Robust Loss, Absolute Loss, and e-sensitive Loss. They provide some applied examples that work well with Steinwart's work. Clearly, here is an example where one can add value by creating a class of loss functions that are tied to the empirical probability density function (pdf) of the observed data. Of course, this class can be reused in other estimation problems as well.
Tesla and CUDA
http://www.nvidia.com/object/cuda_learn.html provides the necessary toolkit and SDK to program in C for Tesla. There is a good article on "Parallel Processing with CUDA" at
http://www.nvidia.com/docs/IO/55972/220401_Reprint.pdf .
For example of both CPU and CUDA code, Consider matrix addition example by Seland at
http://heim.ifi.uio.no/~knutm/geilo2008/seland.pdf .
A: CPU Code
void add_matrix
( float* a, float* b, float* c, int N ) {
int index;
for ( int i = 0; i < N; ++i )
for ( int j = 0; j < N; ++j ) {
index = i + j*N;
c[index] = a[index] + b[index];
}
}
int main() {
add_matrix( a, b, c, N );
}
B: CUDA Code
//Compute Kernel
__global__
void add_matrix
( float* a, float* b, float* c, int N ) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int index = i + j*N;
if ( i < N && j < N )
c[index] = a[index] + b[index];
}
int main() {
dim3 dimBlock( blocksize, blocksize );
dim3 dimGrid( N/dimBlock.x, N/dimBlock.y );
add_matrix<<
}
Notice that the double for loop for the CPU is placed with a grid. This presentation examines the use of threads in thread blocks contained in a grid of thread blocks. We can extend the main function:
//Define Grid Size
const int N=1024;
const int blocksize=16;
int main() {
//CPU Memory allocation
float *a = new float[N*N];
float *b = new float[N*N];
float *c = new float[N*N];
for ( int i = 0; i < N*N; ++i ) {
a[i] = 1.0f; b[i] = 3.5f; }
//GPU Memory allocation
float *ad, *bd, *cd;
const int size = N*N*sizeof(float);
cudaMalloc( (void**)&ad, size );
cudaMalloc( (void**)&bd, size );
cudaMalloc( (void**)&cd, size );
//Copy data to GPU
cudaMemcpy( ad, a, size, cudaMemcpyHostToDevice );
cudaMemcpy( bd, b, size, cudaMemcpyHostToDevice );
//Execute Kernel
dim3 dimBlock( blocksize, blocksize );
dim3 dimGrid( N/dimBlock.x, N/dimBlock.y );
add_matrix<<
//Copy result back to CPU
cudaMemcpy( c, cd, size, cudaMemcpyDeviceToHost );
//Clean Up and Return
cudaFree( ad ); cudaFree( bd ); cudaFree( cd );
delete[] a; delete[] b; delete[] c;
return EXIT_SUCCESS;
}
Following these basics and using the examples in the CUDA sdk, one can easily be running parallel programs for your GIS and imaging applications on your very own personal supercomputer.
Genetic Programming for Time Series
http://www.codeproject.com/KB/recipes/aforge.aspx
Figure 1 shows the interface for obtaining the prediction of a growing siusoid.

The application using three Aforge components, AForge, AForge.Genetic, AForge.Controls for the estimation and interface. Since the copyright permits re-distribution for non-commerical use, I wanted to include it here to give an idea how to quickly one can create an application from an existing framework for the prediction of univariate time series such as stock prices. Also, not the framework construction, the use of worker threads. There is a worker thread, SearchSolution, for find the solution and two delegates for asynchronous calls to set control properties.
private delegate void SetTextCallback( System.Windows.Forms.Control control, string text )
private delegate void AddSubItemCallback( System.Windows.Forms.ListView control, int item, string subitemText )
I have remove some of the code because of length, in the declation of controls region, and the initializeComponent() method.
// AForge Framework
// Time Series Prediction using Genetic Programming and Gene Expression Programming
//
// Copyright © Andrew Kirillov, 2006
// andrew.kirillov@gmail.com
//
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using System.IO;
using System.Threading;
using AForge;
using AForge.Genetic;
using AForge.Controls;
namespace TimeSeries
{
///
/// Summary description for Form1.
///
public class MainForm : System.Windows.Forms.Form
{
private System.Windows.Forms.GroupBox groupBox1;// remove the code for the sake of exposition
//
private double[] data = null;
private double[,] dataToShow = null;
private int populationSize = 40;
private int iterations = 100;
private int windowSize = 5;
private int predictionSize = 1;
private int selectionMethod = 0;
private int functionsSet = 0;
private int geneticMethod = 0;
private int headLength = 20;
private Thread workerThread = null;
private bool needToStop = false;
private double[,] windowDelimiter = new double[2, 2] { { 0, 0 }, { 0, 0 } };
private double[,] predictionDelimiter = new double[2, 2] { { 0, 0 }, { 0, 0 } };
// Constructor
public MainForm( )
{
//
// Required for Windows Form Designer support
//
InitializeComponent();
//
chart.AddDataSeries( "data", Color.Red, Chart.SeriesType.Dots, 5 );
chart.AddDataSeries( "solution", Color.Blue, Chart.SeriesType.Line, 1 );
chart.AddDataSeries( "window", Color.LightGray, Chart.SeriesType.Line, 1, false );
chart.AddDataSeries( "prediction", Color.Gray, Chart.SeriesType.Line, 1, false );
selectionBox.SelectedIndex = selectionMethod;
functionsSetBox.SelectedIndex = functionsSet;
geneticMethodBox.SelectedIndex = geneticMethod;
UpdateSettings( );
}
///
/// Clean up any resources being used.
///
protected override void Dispose( bool disposing )
{
if( disposing )
{
if (components != null)
{
components.Dispose();
}
}
base.Dispose( disposing );
}
#region Windows Form Designer generated code
///
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
///
private void InitializeComponent()
{
//
//Removed the code
//
this.components = new System.ComponentModel.Container();
}
#endregion
///
/// The main entry point for the application.
///
[STAThread]
static void Main( )
{
Application.Run( new MainForm( ) );
}
// Delegates to enable async calls for setting controls properties
private delegate void SetTextCallback( System.Windows.Forms.Control control, string text );
private delegate void AddSubItemCallback( System.Windows.Forms.ListView control, int item, string subitemText );
// Thread safe updating of control's text property
private void SetText( System.Windows.Forms.Control control, string text )
{
if ( control.InvokeRequired )
{
SetTextCallback d = new SetTextCallback( SetText );
Invoke( d, new object[] { control, text } );
}
else
{
control.Text = text;
}
}
// Thread safe adding of subitem to list control
private void AddSubItem( System.Windows.Forms.ListView control, int item, string subitemText )
{
if ( control.InvokeRequired )
{
AddSubItemCallback d = new AddSubItemCallback( AddSubItem );
Invoke( d, new object[] { control, item, subitemText } );
}
else
{
control.Items[item].SubItems.Add( subitemText );
}
}
// On main form closing
private void MainForm_Closing(object sender, System.ComponentModel.CancelEventArgs e)
{
// check if worker thread is running
if ( ( workerThread != null ) && ( workerThread.IsAlive ) )
{
needToStop = true;
while ( !workerThread.Join( 100 ) )
Application.DoEvents( );
}
}
// Update settings controls
private void UpdateSettings( )
{
populationSizeBox.Text = populationSize.ToString( );
iterationsBox.Text = iterations.ToString( );
windowSizeBox.Text = windowSize.ToString( );
predictionSizeBox.Text = predictionSize.ToString( );
}
// Load data
private void loadDataButton_Click(object sender, System.EventArgs e)
{
// show file selection dialog
if ( openFileDialog.ShowDialog( ) == DialogResult.OK )
{
StreamReader reader = null;
// read maximum 50 points
double[] tempData = new double[50];
try
{
// open selected file
reader = File.OpenText( openFileDialog.FileName );
string str = null;
int i = 0;
//Read Data method removed
//
// Update prediction size
private void UpdatePredictionSize( )
{
if ( data != null )
{
// get new prediction size value
try
{
predictionSize = Math.Max( 1, Math.Min( 10, int.Parse(
predictionSizeBox.Text ) ) );
}
catch
{
predictionSize = 1;
}
// check if we have too few data
if ( data.Length - predictionSize - 1 < predictionsize =" 1;" text =" string.Empty;" i =" 0," n =" dataList.Items.Count;"> 1 )
dataList.Items[i].SubItems.RemoveAt( 1 );
}
}
// On button "Start"
private void startButton_Click( object sender, System.EventArgs e )
{
ClearSolution( );
// get population size
try
{
populationSize = Math.Max( 10, Math.Min( 100, int.Parse( populationSizeBox.Text
) ) );
}
catch
{
populationSize = 40;
}
// iterations
try
{
iterations = Math.Max( 0, int.Parse( iterationsBox.Text ) );
}
catch
{
iterations = 100;
}
// update settings controls
UpdateSettings( );
selectionMethod = selectionBox.SelectedIndex;
functionsSet = functionsSetBox.SelectedIndex;
geneticMethod = geneticMethodBox.SelectedIndex;
// disable all settings controls except "Stop" button
EnableControls( false );
// run worker thread
needToStop = false;
workerThread = new Thread( new ThreadStart( SearchSolution ) );
workerThread.Start( );
}
// On button "Stop"
private void stopButton_Click( object sender, System.EventArgs e )
{
// stop worker thread
needToStop = true;
while ( !workerThread.Join( 100 ) )
Application.DoEvents( );
workerThread = null;
}
// Worker thread
void SearchSolution( )
{
// constants
double[] constants = new double[10] { 1, 2, 3, 5, 7, 11, 13, 17, 19, 23 };
// create fitness function
TimeSeriesPredictionFitness fitness = new TimeSeriesPredictionFitness(
data, windowSize, predictionSize, constants );
// create gene function
IGPGene gene = ( functionsSet == 0 ) ?
(IGPGene) new SimpleGeneFunction( windowSize + constants.Length ) :
(IGPGene) new ExtendedGeneFunction( windowSize + constants.Length );
// create population
Population population = new Population( populationSize, ( geneticMethod == 0 ) ?
(IChromosome) new GPTreeChromosome( gene ) :
(IChromosome) new GEPChromosome( gene, headLength ),
fitness, ( selectionMethod == 0 ) ? (ISelectionMethod) new EliteSelection( ) :
( selectionMethod == 1 ) ? (ISelectionMethod) new RankSelection( ) :
(ISelectionMethod) new RouletteWheelSelection( )
);
// iterations
int i = 1;
// solution array
int solutionSize = data.Length - windowSize;
double[,] solution = new double[solutionSize, 2];
double[] input = new double[windowSize + constants.Length];
// calculate X values to be used with solution function
for ( int j = 0; j < bestfunction =" population.BestChromosome.ToString(" learningerror =" 0.0;" predictionerror =" 0.0;" j =" 0," n =" data.Length" k =" 0," b =" j">= n - predictionSize )
{
predictionError += Math.Abs( solution[j, 1] - data
[windowSize + j] );
}
else
{
learningError += Math.Abs( solution[j, 1] - data
[windowSize + j] );
}
}
// update solution on the chart
chart.UpdateDataSeries( "solution", solution );
// set current iteration's info
SetText( currentIterationBox, i.ToString( ) );
SetText( currentLearningErrorBox, learningError.ToString( "F3" ) );
SetText( currentPredictionErrorBox, predictionError.ToString( "F3" ) );
}
catch
{
// remove any solutions from chart in case of any errors
chart.UpdateDataSeries( "solution", null );
}
// increase current iteration
i++;
//
if ( ( iterations != 0 ) && ( i > iterations ) )
break;
}
// show solution
SetText( solutionBox, population.BestChromosome.ToString( ) );
for ( int j = windowSize, k = 0, n = data.Length; j < n; j++, k++ ) {
AddSubItem( dataList, j, solution[k, 1].ToString( ) );
}
// enable settings controls EnableControls( true );
}
// On "More settings" button click
private void moreSettingsButton_Click( object sender, System.EventArgs e ) {
ExSettingsDialog settingsDlg = new ExSettingsDialog( );
// init the dialog
settingsDlg.MaxInitialTreeLevel = GPTreeChromosome.MaxInitialLevel; settingsDlg.MaxTreeLevel = GPTreeChromosome.MaxLevel;
settingsDlg.HeadLength = headLength;
// show the dialog
if ( settingsDlg.ShowDialog( ) == DialogResult.OK ) {
GPTreeChromosome.MaxInitialLevel = settingsDlg.MaxInitialTreeLevel; GPTreeChromosome.MaxLevel = settingsDlg.MaxTreeLevel;
headLength = settingsDlg.HeadLength;
}
}
}
}
Reading this code, one can see that it is easy to use these three components and create a GUI to perform univariate time series analysis with genetic algorithms. Thus, you can download historical price series from Yahoo Finance and plug it into this application or your own customization and write an article on using genetic algorithms for forecasting stock prices at different temporal intervals. Furthermore, you could build on the interface to add more features and/or update the methods in the AForge.Genetic component.
Thursday, January 29, 2009
Agent Grammar: The Interpreter Design Pattern
http://www.codeproject.com/KB/architecture/InterpreterPatternParsing.aspx for "Parsing Algebraic Expressions Using the Interpreter Pattern". The classes used in this article are AXExceptions, AXParser,AXValidator, ExpressionContext, Expressions, ExpressionFactory, and AXTokenizer. The idea is to perform lexical analysis by transforming a character sequence into a token sequence. The token sequence is validated and the ExpressionFactory is used to translate the token sequence into a sequence of Expression objects. This becomes an Expression Object to be parsed. This expression tree is then parsed and evaluated by substitution. Besides the interpreter pattern, the factory, template method and composite are included in this application as well.
I wanted to show some of the building blocks that I am using for my application with the following from the article:
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
namespace AgentGrammar
{
internal class Token
{
private Dictionary
public Token()
{
_patterns = new Dictionary
}
public void AddPattern(string pattern)
{
if (string.IsNullOrEmpty(pattern))
throw new AXException("The pattern cannot be null or an empty string.");
if (_patterns.ContainsKey(pattern))
throw new AXException("The pattern has already been added to the tokenizer.");
try
{
_patterns.Add(pattern, new Regex(pattern));
}
catch
{
throw new AXException("The pattern must be a valid regular expression.");
}
}
public bool RemovePattern(string pattern)
{
return _patterns.Remove(pattern);
}
public List
{
List
for (int i = 0; i < matched =" false;" j =" text.Length"> 0 && !matched; j--)
{
foreach(KeyValuePair
{
if (pair.Value.IsMatch(text.Substring(i, j)))
{
matched = true;
tokens.Add(text.Substring(i, j));
i += j - 1;
break;
}
}
}
if (!matched)
{
throw new AXTokenException(i, "Unrecognized character sequence starting at index " +
i.ToString() + ".");
}
}
return tokens;
}
}
}
The structural foundations for the interpreter pattern is presented below for the Parser class.
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
public sealed class Parser
{
private ExpressionFactory _factory = null;
private Token _tokenizer = null;
public Parser()
{
_factory = new ExpressionFactory();
_tokenizer = new Token();
AddAssociationInternal(@"^[a-z]$", typeof(VariableExpression));
AddAssociationInternal(@"^\d+(\.\d+)?$", typeof(NumericExpression));
//constants
AddAssociationInternal(@"^PI$", typeof(PIExpression));
AddAssociationInternal(@"^E$", typeof(EExpression));
}
//More methods not included....
private List
{
List
for (int i = 0; i < copy =" text.Replace("> tokens = _tokenizer.Tokenize(copy);
List
//AXValidator.Validate(expressions); //throws
//RemoveExcessParens(expressions);
while (expressions.Count > 1)
{
//int i = DetermineHighestPrecedence(expressions);
//CollapseExpression(expressions, i);
//RemoveExcessParens(expressions);
}
return expressions[0];
}
}
This internal class creates expressions. I like the use of the Dictionary object.
internal class ExpressionFactory
{
private Dictionary
public ExpressionFactory()
{
_associations = new Dictionary
}
public void AddAssociation(string pattern, Type type)
{
if (string.IsNullOrEmpty(pattern))
throw new AXException("The pattern cannot be null or an empty string.");
if (type == null)
throw new AXException("The type cannot be null.");
if (_associations.ContainsKey(pattern))
throw new AXException("The pattern has already been associated with a type.");
try
{
_associations.Add(pattern, new KeyValuePair
}
catch
{
throw new AXException("The pattern must be a valid regular expression.");
}
}
public bool RemoveAssociation(string pattern)
{
return _associations.Remove(pattern);
}
public Expression Create(string token)
{
if (!string.IsNullOrEmpty(token))
{
foreach (KeyValuePair
{
if (pair.Value.Key.IsMatch(token))
{
ConstructorInfo info = pair.Value.Value.GetConstructor(Type.EmptyTypes);
return (Expression)info.Invoke(null);
}
}
}
return null;
}
public sealed class Context
{
public enum Variable
{
a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z
}
private Dictionary
public Context()
{
_bindings = new Dictionary
}
public void Bind(Variable variable, double value)
{
_bindings[variable] = value;
}
public double Lookup(Variable variable)
{
if (_bindings.ContainsKey(variable))
return _bindings[variable];
else
return double.NaN;
}
}
public abstract class Expression
{
protected bool _isBound;
public Expression()
{
_isBound = false;
}
public bool IsBound()
{
return _isBound;
}
public void Bind(params object[] arguments)
{
InnerBind(arguments);
_isBound = true;
}
public double Evaluate(ExpressionContext context)
{
double result = InnerEvaluate(context);
if (Math.Abs(result) <= ZERO_THRESHOLD)
return 0;
else
return result;
}
protected abstract void InnerBind(params object[] arguments);
protected abstract double InnerEvaluate(ExpressionContext context);
}
public sealed class PIExpression: Expression
{
protected override double InnerEvaluate(ExpressionContext context)
{
return Math.PI;
}
}
public sealed class EExpression: Expression
{
protected override double InnerEvaluate(ExpressionContext context)
{
return Math.E;
}
}
We can see all the elements of the interpreter pattern in the above code-Parser, Context, and Expressions. Consider the example of "e+PI" for parsing in using the above code.
Parser parse = new Parser();
Expression express = parse.Parse("E+PI");
Context ct = new Context();
MessageBox.Show(express.Evaluate(ct).ToString());
Clearly, the code above provides a pattern for a both a language and grammar for communication and is extensible for adding additional grammatical rules by subclassing the Expression class.
Wednesday, January 28, 2009
EDA:Message Boards
Agent 1 receives the message: ARIMA estimate is High.
Agent 2 receives the message: ARIMA estimate is High.
Agent 1 receives the message: SV estimate is Low.
Agent 3 receives the message: SV estimate is Low.
Using System;
public class Subscriber{
private string m_AgentName;
public Subscriber( string name)
{
m_AgentName=name;
}
public void OnReceptionOfMessage(Object sender, EventArgs e)
{
MessageEventArgs msg=e as MessageEventArgs;
if( msg !=null){
Console.WriteLine ( m_AgentName + " receives the message: " +((MessageEventArgs)e).GetMessageInfo());
}
}
}
//The Event Body
public class MessageArgs:EventArgs{
private string m_Message;
public string GetMessageInfo(){return m_Message;}
}
public delegate void MessageEventHandler(Object sender, EventArgs e);
//Messages are published by triggering events.
public class MessageBoard{
public event MessageEventHandler ARIMA;
public event MessageEventHandler SV;
public void OnARIMA(MessageEventArgs modelReport)
{ if( ARIMA !=null)
ARIMA(this, modelReport);
}
public void OnSV(MessageEventArgs modelReport){
if( SV !=null)
SV(this, modelReport);
}
}
public class AgentModel{
public static void Main ()
{
MessageBoard msgBoard = new MessageBoard();
//Create Subscribers
Subscriber Agent1 = new Subscriber("Agent 1");
Subscriber Agent2 = new Subscriber("Agent 2");
Subscriber Agent3 = new Subscriber("Agent 3");
//Link agents to the messageboard
MsgBoard.ARIMA += Agent1.OnReceptionofMessage;
MsgBoard.ARIMA += Agent2.OnReceptionofMessage;
MsgBoard.SV += Agent1.OnReceptionofMessage;
MsgBoard.SV += Agent3.OnReceptionofMessage;
//Publish to the message board the result of the analysis
MsgBoard.OnARIMA(new MessageArgs("ARIMA estimate is High");
MsgBoard.OnSV(new MessageArgs("SV estimate is Low");
}
}
Event Driven Architecture:The Observer Pattern
Basics
An event can be considered any notable change that occurs in or out of a business domain. It might suggest a problem, an opportunity, or threshold. Thus, we need to have a definition and instance of the event. Also, we should have an event header that describes, for example, the following items:
-Event Specification ID
-TypeName
-TimeStamp
-Occurence Number
-Creator
The event body is needed to describe what happened and should be developed by use of an ontology. There are three styles associated with EDA:
(1) Simple Event Processing - A notable event happens that initiates down-stream processing
(2) Stream Event Processing - Both ordinary and important events occur. Here ordinary events are filtered for importance and streamed to subscribers.
(3) Complex Event Processing - Here the evaluation of many different types of events occur.
Because of different event types, there is needed event correlation which can be causal,temporal and/or spatial. Thus, there is a need for event interpreters, event pattern definition and matching, and correlation techniques.
Event Processing
There are four logical layers in our EDA:
(1) Event Generators - The source generates an event.
(2) Event Channels - The channel is the messaging backbone for events.
(3) Event Processing - Events are evaluated against event processing rules. This is the event engine.
(4) Event Driven Activity - The invocation of an activity. For example, the invocation of an activity is pushed by the Event Processing or pulled by subscribers of event publications.
We can implement all four logical layers in the C#.NET language.
C#.NET Language
In a .NET language, an event can be considered to be the outcome of an action. There is an event source and receiver. The object that raises the event is the source and the object that responds is the reciever. A delegate is the communication channel between the event source and receiver. This delegate needs to be declared, instantiated and invocated. For example,
A: Declaration
public delegate int TestDelegate(object obj1, object obj2) ;
B: Instantiation
public TestMethod()
{
TestDelegate TD = new TestDelegate(TestDelegateMethod) ;
}
C: Invocation
TestDelegateMethod(" This is a Test.");
Consider now tying this delegate to an event. Back to the idea of event source and receiver. Since delegates can be called anonymously, we need an event so that a client class can pass in delegates to do something when something happens. For example,
public delegate void TestDelegate(int a1);
public event TestDelegate TestEvent;
Now that we have declared both the delegate and event, we can specify an event handler using the += operator for the attachment. For example,
TestClass.TestEvent += new MyEvent(TestMethod);
we can detach the event by
TestClass.TestEvent -= new MyEvent(TestMethod);
In order to raise the event, we can just make a call to TestEvent(8). Of course,we can extend this to different type of delegates: single cast and multi-cast. The single cast calls only one function, while the multi-cast points to a linked list and calls all the functions as part of a linked list. The namespace for the mulit-cast delegate is System.Multicastdelegate. I think it is instructive here for EDA to look at an example,
public static Delegate Combine (Delegate a, Delegate b) ;
class TestDelegate1{
public void displayDelegate1( string s1)
{
Console.WriteLine("TestDelegate1");
}
}class TestDelegate2
{
public void displayDelegate2( string s1)
{
Console.WriteLine("TestDelegate2");
}
}
public delegate void OnMessage(string s);
class TestMulticastDelegates
{
public static void Main (string[] args)
{
TestDelegate1 md1 = new TestDelegate1();
TestDelegate2 md2 = new TestDelegate2();
OnMessage oM1 = new OnMessage(md1.displayDelegate1);
OnMessage oM2 = new OnMessage(md2.displayDelegate2);
OnMessage omc;
//Combine the delegates and omc points to the linked list
omc=(OnMessage)Delegate.Combine(om1, om2);
Delegate [] arrayOnMessage;
//Array of delegate references
arrayOnMessage = omc.GetInvocationList();
OnMessage om3;
//Navigate through the array to call each delegate
for (int i=0; i < arrayOnMessage.Length; i++)
{
om3 = (OnMessage)arrayOnMessage[i];
om3("string");
}
}
}
One can see how delegates and event handling fit into the framework of the EDA. The delegates are the event channel and their signatures form the event headers and body. For an example of event processing and event driven behavior consider using the Observer pattern.
Observer Pattern in C#.NET
Here is a classic example of the publish/subscribe or observer pattern for delegates and event handling.
using System;
// (2) Event Channel
public delegate void EventHandler(object sender, EventArgs e);
public class Button
{
// (3) Event Processing Engine with rules
protected virtual void OnClicked(EventArgs e)
{ if (Clicked != null)
//(1) This is the event generator
Clicked(this, e);
}
}
public class TestWindow
{
private Button TestButton;
public TestWindow()
{
TestButton = new Button();
//Attach the event to the control for communication
TestButton.Clicked += new EventHandler(TestButton_Clicked);
}
private void TestButton_Clicked(object sender, EventArgs e)
{
//(4) Event Driven Activity
//Method is invoked when Clicked(this,e) is called.
}
}
Conclusion
Using an EDA with event streaming for even data management provides opportunties for simulation, analysis, root-cause analysis, validation, auditing and compliance. By using C#.NET, delegates and events in the observer pattern, it becomes easy to develop an event driven architecture as discussed above.
Tuesday, January 27, 2009
Poetry and Parsing Text
Figure 1. CutNMix Interface

In the Help section of the application, the author(s) state
"In 1997, the first version of Cut'n'Mix to include multi-source text mixing was released. The intention was to create the same kind of look-and-feel you might find in a 4-track audio cassette recorder but with the capability to mix text instead of sound. Subsequent versions added word shredding and gluing, morphing, swapping, random word fill, web page fill and a "custom wordbook" function to allow users to store their own word databases."
In thinking about the architecture here for this application, one gains insight into the how to utilize a parsing application for examining student's posts and in the process reveals ideas about writing poetry.
The Lokad Business Model
Along this train of thought, it would be nice to have these models available as a web service that graduate students from other disciplines can subscribe to without have the necessary expertise to build the model, just use it. Lokad uses
SourceForge.NET, http://sourceforge.net/projects/lokad/ , to provide open-source integration into 3rd party application. Of course, we are not limited to just temporal dimensions, but spatial models as well for fMRI and EEG.
I want to point out that they also have a sandbox development server so researchers can develop software products using their forecasting technology. Finally, their customer support section has executive summaries, documentation, questions and answers, and support only products. I think that any company that wants to use the web services subscription business model can benefit from the replication of these ideas and content contained on their web site.
Stochastic Volatility Models:WINBUGS and R
http://www.mrc/-bsu.cam.ac.uk/bugs/ .
An example of a stochastic volatility in WinBUGS and R is posted on my older notes site at
http://www.thecromwellworkshop.com/NotesFromWorkshop/BayesianStochasticVolatilityModels.aspx. However, I am in the process of migrating these older posts and reproduce the information here.
A: The Stochastic Model for WinBugs
model
{
mu~dnorm(0,0.04)
phistar~dbeta(20,1.5)
itau2~dgamma(2.5,0.025)
rho~dunif(-1,1)
#beta<-exp(mu/2)
phi<-2*phistar-1
pi <- 3.141592654
tau<-sqrt(1/itau2)
alpha<-mu*(1-phi)
theta0 ~ dnorm(mu,itau2)
thmean[1] <- mu + phi*(theta0-mu)
theta[1] ~ dnorm(thmean[1],itau2)I(-100,100)
for (i in 2:N) {
thmean[i] <- mu + phi*(theta[i-1]-mu)
theta[i] ~ dnorm(thmean[i],itau2)I(-80,80)
}
Ymean[1]<-0
Yisigma2[1]<-1/(exp(theta[1]))
Y[1]~dnorm(Ymean[1],Yisigma2[1])
loglike[1]<- (-0.5*log(2*pi) + 0.5*log(Yisigma2[1]) - 0.5*Yisigma2[1]*Y[1]*Y[1]) for (i in 2:N) {
Ymean[i]<-rho/tau*exp(0.5*theta[i])*(theta[i]-mu-phi*(theta[i-1]-mu))
Yisigma2[i] <- 1/(exp(theta[i])*(1-rho*rho))
Y[i]~ dnorm(Ymean[i],Yisigma2[i])
loglike[i]<- (-0.5*log(2*pi) + 0.5*log(Yisigma2[i]) - 0.5*Yisigma2[i]*Y[i]*Y[i])
}
node1<- -sum(loglike[])
}
B: Running the above model (A) in R with BRUGS.
Here is the SVModel in SVModel.txt. Just copy and paste into the R GUI at the prompt. Make sure that you download the libraries and the 10,000 iterations takes some time so you might want to initially reduce the amount when testing the algorithm.
library(BRugs)
#
# Check the syntax of the model
#
modelCheck("SV1.txt")
#
# Load the Data
#
modelData("dSugar.txt")
#
# Specify the number of chains
#
modelCompile(numChains=1)
#
# Specify the initial conditions for the chains
#
#modelInits("SVinit2.txt")
modelGenInits()
#
# Initial burin-in
#
modelUpdate(1000)
#
# Monitor the variables
#
samplesSet(c("phi","mu","itau2","rho"))
#
# Do more iterations
#
modelUpdate(10000)
#
# variable statistics
#
samplesStats("*")
#
#Deviance Information Criterior
#
dicSet()
dicStats()
#
# Build MCMC objects for CODA
#
MC.phi<-buildMCMC("phi")
MC.itau2<-buildMCMC("itau2")
MC.rho<-buildMCMC("rho")
MC.mu<-buildMCMC("mu")
#
# Raferty and Lewis diagnostics
#
raftery.diag(MC.phistar, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
raftery.diag(MC.itau2, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
raftery.diag(MC.rho, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
raftery.diag(MC.mu, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
#
# Gewek Convergence diagnostics
#
geweke.diag(MC.phistar, frac1=0.1, frac2=0.5)
geweke.diag(MC.itau2, frac1=0.1, frac2=0.5)
geweke.diag(MC.rho, frac1=0.1, frac2=0.5)
geweke.diag(MC.mu, frac1=0.1, frac2=0.5)
#
C: Here is the output in R.
R version 2.4.0 (2006-10-03)
Copyright (C) 2006 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BRugs)
Welcome to BRugs running on OpenBUGS version 2.2.0 beta
> #
> # Check the syntax of the model
> #
> modelCheck("SV1.txt")
model is syntactically correct
> #
> # Load the Data
> #
> modelData("dSugar.txt")
data loaded
> #
> # Specify the number of chains
> #
> modelCompile(numChains=1)
model compiled
> #
> # Specify the initial conditions for the chains
> #
> #modelInits("SVinit2.txt")
> modelGenInits()
initial values generated, model initialized
> #
> # Initial burin-in
> #
> modelUpdate(1000)
1000 updates took 212 s
> #
> # Monitor the variables
> #
> samplesSet(c("phi","mu","itau2","rho"))
monitor set for variable 'phi'
monitor set for variable 'mu'
monitor set for variable 'itau2'
monitor set for variable 'rho'
> #
> # Do more iterations
> #
> modelUpdate(10000)
10000 updates took 1949 s
> #
> # variable statistics
> #
> samplesStats("*")
mean sd MC_error val2.5pc median val97.5pc start sample
itau2 3.5300 0.70520 0.062810 2.4280 3.4170 5.03200 1001 10000
mu -8.1460 0.06697 0.002150 -8.2740 -8.1470 -8.01000 1001 10000
phi 0.8258 0.03279 0.002703 0.7589 0.8264 0.88310 1001 10000
rho -0.1139 0.03989 0.001499 -0.1907 -0.1138 -0.03554 1001 10000
> #
> #Deviance Information Criterior
> #
> dicSet()
deviance set
> dicStats()
Dbar Dhat DIC pD
Y 0 0 0 0
theta 0 0 0 0
total 0 0 0 0
> #
> # Build MCMC objects for CODA
> #
> MC.phi<-buildMCMC("phi")
Loading required package: coda
Loading required package: lattice
Warning message:
package 'lattice' was built under R version 2.4.1
> MC.itau2<-buildMCMC("itau2")
> MC.rho<-buildMCMC("rho")
> MC.mu<-buildMCMC("mu")
>
> #
> # Raferty and Lewis diagnostics
> #
> raftery.diag(MC.phistar, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
Error in inherits(x, "mcmc.list") : object "MC.phistar" not found
> raftery.diag(MC.itau2, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
itau2 144 160832 3746 42.9
> raftery.diag(MC.rho, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
rho 14 16032 3746 4.28
> raftery.diag(MC.mu, q=0.025, r=0.005, s=0.95, converge.eps=0.001)
Quantile (q) = 0.025
Accuracy (r) = +/- 0.005
Probability (s) = 0.95
Burn-in Total Lower bound Dependence
(M) (N) (Nmin) factor (I)
mu 3 4095 3746 1.09
> #
> # Gewek Convergence diagnostics
> #
> geweke.diag(MC.phistar, frac1=0.1, frac2=0.5)
Error in inherits(x, "mcmc.list") : object "MC.phistar" not found
> geweke.diag(MC.itau2, frac1=0.1, frac2=0.5)
[[1]]
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
itau2
-3.931
> geweke.diag(MC.rho, frac1=0.1, frac2=0.5)
[[1]]
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
rho
2.779
> geweke.diag(MC.mu, frac1=0.1, frac2=0.5)
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
mu
-3.410
As I build the web services API to re-run the empirical section for my article mentioned, I plan to combine the two into a tutorial post for http://www.codeproject.com/ and discuss some of these features at length. I like the approach by http://www.lokad.com/ for on-line demand forecasting and using web services API for developers, http://www.lokad.com/Developers.ashx . This company located in Paris, France provides a good working model on how to develop and use an SOA architecture for time series forecasting purposes.
References
Yu, J. (2002) Forecasting volatility in the New Zealand stock market. Applied Financial Economics 12, 193—202.
Yu, J. (2005) On leverage in a stochastic volatility model. Journal ofEconometrics 127, 165—178.
Monday, January 26, 2009
Parsing Text Files
Lysle (2008) has an article on parsing that I can use for these purposes at http://www.csharpcorner.com/UploadFile/scottlysle/ParseSentencesCS02112008055809AM/ParseSentencesCS.aspx. Here is the user interface in Figure 1 with the sample application in C#.
Figure 1. Parsing Interface

The author illustrates three methods from the article:
- Parse Reasonable: Split the text using typical sentence terminations and keep the sentence termination.
- Parse Best: Split the text based upon the use of a regular expressions
- Parse Without Endings: Split the text without sentence terminations.
Of course, I would like to be able to copy and paste the text from the clipboard. Here is an example of the two methods for the controls btnCopy and btnPaste.
private void btnCopy_Click(object sender, EventArgs e)
{
Clipboard.SetText(txtBoxA.Text);
}
//paste the text
private void btnPaste_Click(object sender, EventArgs e)
{
txtBoxB.Text = Clipboard.GetText();
}
Of course, we can also parse to XML with the following code by Cochran at http://www.csharpcorner.com/UploadFile/rmcochran/FlatFileToXmlDocument06302007111353AM/FlatFileToXmlDocument.aspx . Here is the example,
using System;
using System.Collections.Generic;
using System.Text;using System.Xml;
using System.Text.RegularExpressions;
using System.IO;
namespace FlatFileParser
{
public static class Parser
{
#region Member Variables
private const string
strComma = ",",
strTemporaryPlaceholder = "~~`~~",
strTab = "\t";
private static readonly Regex
m_commaFixer = new Regex("\".*?,.*?\"", RegexOptions.Compiled), m_quotesOnBothEnds = new Regex("^\".*\"$", RegexOptions.Compiled);
#endregion
#region Methods
public static XmlDocument ParseTabToXml(string input, string topElementName, string recordElementName,
params string[] recordItemElementName)
{
XmlDocument doc = ParseToXml(input, new char[] { strTab[0] }, topElementName, recordElementName,
recordItemElementName);
PostProcess(doc, PostProcessTabNode);
return doc;
}
public static XmlDocument ParseCsvToXml(string input, string topElementName, string recordElementName,
params string[] recordItemElementName)
{
input = PreProcessCSV(input);
XmlDocument doc = ParseToXml(input, new char[] { strComma[0] }, topElementName, recordElementName,
recordItemElementName);
PostProcess(doc, PostProcessCsvNode); return doc;
}
#endregion
#region Utility Methods
private static XmlDocument ParseToXml(string input, char[] seperator, string topElementName, string recordElementName, string[]recordItemElementName)
{
string[][] data = Dissasemble(input, seperator);
return BuildDocument(data, topElementName, recordElementName, recordItemElementName);
}
private static string PreProcessCSV(string input)
{
MatchCollection collection = m_commaFixer.Matches(input);
foreach (Match m in collection) input = input.Replace( m.Value, m.Value.Substring(1, m.Value.Length - 2).Replace(strComma, strTemporaryPlaceholder));
return input;
}
private static void PostProcess(XmlNode node, Action
{
process(node);
foreach (XmlNode subNode in node.ChildNodes)
PostProcess(subNode, process);
}
private static void PostProcessTabNode(XmlNode node)
{
if (!String.IsNullOrEmpty(node.Value) && m_quotesOnBothEnds.IsMatch(node.Value)) node.Value = node.Value.Substring(1, node.Value.Length - 2);
}
private static void PostProcessCsvNode(XmlNode node)
{
if(! String.IsNullOrEmpty(node.Value))
node.Value = node.Value.Replace(strTemporaryPlaceholder, strComma);
}
Using these building blocks, one can build an application to accomplish the ideas presented above.
Friday, January 23, 2009
Algorithmic Composition : Directed Acyclic Graphs
- mathematical models
- knowledge based systems
- grammars
- evolutionary methods
- learning systems
Of course, using stochastic processes such as Markov Chains for composition is obvious as well as fractals from L-Systems. Personally, I like the idea of developing grammars for a particular content domain and mixing all approaches with evolutionary methods like genetic algorithms and learning systems like neural networks. Therefore, I am moving toward stochastic context-free grammars(SCFG) involved with speech recognition systems demonstrated in Hierarchical Hidden Markov Models (HMM). Knudsen and Hein (1999) provide a context for RNA prediction with SCFG at http://bioinformatics.oxfordjournals.org/cgi/reprint/15/6/446 . Shokhirev has C++ code and a discussion on HMM at http://www.shokhirev.com/nikolai/abc/alg/hmm/hmm.html .
Of course, all this talk brings us to Directed Acyclic Graphs (DAG). A good, general powerpoint on DAG is at http://www.google.com/search?hl=en&q=Direct+Acyclic+graphs . The Code Project has a discussion on using DAG on SQL databases which is instructive for the construction of ontologies at http://www.codeproject.com/KB/database/Modeling_DAGs_on_SQL_DBs.aspx . Microsoft has a good article on setting up adjacency matrices with DAGS in graph classes and examples for common graph algorithms. The spatial domain can be included with HMM as in the example by http://www.stats.bris.ac.uk/~peter/papers/asatm-01209r1.pdf for Disease Mapping.
An excellent article for estimating High Dimensional DAGs is at http://jmlr.csail.mit.edu/papers/volume8/kalisch07a/kalisch07a.pdf. The authors use the R statistical language in their simulations. Back in 2005, I was looking at using DAG for stream networks and estimating the analyte concentration through sampling as part of a Bayesian MCMC framework for a West Virginia watershed. Thus, a grammar if you will that would serve me well for the algorithmic composition of paintings that extended the work of the Russian painter, Wassily Kandinsky. who stated that
"Of all the arts, abstract painting is the most difficult. It demands that you know how to draw well, that you have a heightened sensitivity for composition and for colors, and that you be a true poet. This last is essential."
More later.
Wednesday, January 21, 2009
Neural Network Hypercubes
The idea of extracting different feature domains from 2D images led to my desire to revisit my box of neural network papers. I am thinking that I can combine these models with the Kalman Filters of the previous post. In the quote above, the authors set the stage for using a hybrid N-Feature Neural Network (HNFNN) structure that uses radial basis function neural networks for three feature domains.
A good place to start for Java is at http://fbim.fhregensburg.de/~saj39122/jfroehl/diplom/e-index.html . Another is Java Object Oriented Neural Engine (JOONE) at http://www.jooneworld.com/docs/sampleEngine.html . As far as applets go for demos, check out http://neuron.eng.wayne.edu/software.html .

lends itself naturally to neural network models with multiple hidden layers and an extension to the basic ideas expressed above.
Five Dimensional Fusion: The Kalman Filter
Cao (2008) has a file exchange at http://www.mathworks.com/matlabcentral/fileexchange/18189 for using the EKF for nonlinear state estimation. Of course, the work of Wikle et. al. (1998) at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.8958 takes us down this path with environmental models and the discipline of environmetrics. One of my favorites is the short paper on spacetime KF at http://www.stat.missouri.edu/~wikle/s037-_o.pdf . For a comparision of other techniques, see http://portal.acm.org/citation.cfm?id=1238264 . Of course, the extension to Markov Chain Monte Carlo (MCMC) with the WinBugs software can be done here with the work done at http://www.stat.ncsu.edu/library/papers/ISMS2587.pdf. Furthermore, neural networks can be employed in a variety of architectures which is the subject of the next post. These authors take a reparameterization approach for their dynamic space-time models. The bibliographies from these works provide many streams to navigate for applied work. For example,
- Object Tracking (e.g., missiles and faces)
- Navigation
- Computer vision such as Feature and cluster tracking and fusing data
Code
A free C++ EKF library can be found at http://kalman.sourceforge.net/ . A great example of using the KF with C#.Net is the work by Pellarin and Andersen (2006) found at http://www.itu.dk/~vedrana/reports/PF_ReportH.pdf . They examine using KF for a Particle Filter for Eye Tracking. If we desire to use Java at http://mathstat.asu.edu/~eubank/kalman/kalman.html . The paper that uses the code for "Adaptive Order Selection for Spline Smoothing" -http://math.asu.edu/~eubank/kalman/recipe.pdf. Delaney and Ward have a neat Java Tool for exploring state estimation using the KF at
http://eprints.nuim.ie/201/1/Kalman2004_ddfina_submitted.PDF .
For more information, check out Welch's work at http://www.cs.unc.edu/~welch/kalman/ .
Wednesday, January 14, 2009
Intelligent IVR and Web Services
They have a call simulator that permits you to prototype your IVR at http://www.ivrforbeginners.com/downloads/downloads.htm .
Use Cases and UML
description can then be set aside as test cases. Another good article from a different perspective looks at mistakes with Use Cases is at http://alumno.ucol.mx/victorc/public_html/fime/docs/TopTenUseCaseMistakes.pdf.
As part of the requirements analysis for UML: we have Use Case, Class, Activity, StateChart diagrams, Business modeling with workflow, Object diagrams, packages and model management. For an example of these, a course on Object Oriented Software Engineering at http://www.cs.jhu.edu/~scott/oose/index.shtml provides many of the details for using UML. I like the patent application reference at
http://www.freepatentsonline.com/y2002/0026364.html under design and textual analysis. Furthermore, there is a good example for an ATM System with Use Cases and Diagrams at http://www.math-cs.gordon.edu/local/courses/cs211/ATMExample/.
This is a good example of building an application from start to finish with UML. Another good resource as part of Scott W. Ambler's Agile Modeling site at http://www.agilemodeling.com/style/useCaseDiagram.htm provides links for Use Case Templates, Agile Architectures and Best Practices- http://www.agilemodeling.com/essays/bestPractices.htm. Of course, by peforming this level of abstraction in an iterative fashion many potential pitfalls and problems can both be avoided and solved before a single line of code is written.
Monday, January 12, 2009
Java and C# Design Patterns
Hidden Markov Models
A tutorial on using HMM for speech recognition is at http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf as well as examples of the Baum-Welch and Viterbi algorithms along with a list of C and MATLAB packages-http://www.accenture.com/NR/rdonlyres/6EA0F25D-7FA7-4B43-AFDB-CBA983F1347F/0/HMMTutorialPart2.pdf .
References
Churbanov, A. and S. Winters-Hilt (2008). "Implementing EM and Viterbi algorithms for Hidden Markov Model in linear memory," BMC Bioinformatics 9:224.
A Look at the Sphinx
http://cmusphinx.sourceforge.net/sphinx4/.
In the presentation, notice the use of Hidden Markov Models (i.e. http://htk.eng.cam.ac.uk./) with 3-5 states to model a phoneme where each state is a Gaussian Mixture Density function. As stated in the powerpoint, word transitions are developed in terms of transition probabilities. For more information, see the white paper at http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf. Another good powerpoint presentation on speech recognition and HMMs is at http://www.cis.upenn.edu/~cis530/slides-2008/530-speech-rec-2008.ppt. Abate (2007) has a good review article on speech recognition at http://nlp.amharic.org/Members/solomon/papers/ies16conference.pdf
The Java Speech API enables the use of speech technology for graphical user interfaces. Documentation on the API can be found here at http://java.sun.com/products/java-media/speech/reference/api/index.html. There is a good article on "Processing Speech with Java" at http://www.developer.com/java/other/article.php/1471001 and a tutorial at http://javaboutique.internet.com/tutorial/speechapi/.
As far as Natural Language Processing for speech recognition, MIT has a good advanced course on NLP at http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-864Fall-2005/Syllabus/index.htm. The lingusitic layers of pragmatic, semantic, syntatic, lexical, morphologic, and phonetic are used as part of NLP. For a more complete description, see http://courses.mbl.edu/Medical_Informatics/99.2/Outlines/Starren/jbs_acquisition1.ppt.
An interesting application for this is with low-cost note-taking, i.e.
http://www.csc.villanova.edu/~tway/publications/wayAT08.pdf. Another example is speech recognition for Family Medicine at https://idealhealth.wikispaces.com/Speech+recognition which provides both information, resources and links.
Speech recognition is a fascinating field because it models analog waves through translation into digital signatures that have semantic content. Thus, real-time data flows, i.e. streams like stock prices or EEG signals, and then mapped into symbols that can be arranged in to cluster/groups for more semantic content. This provides for multiple layers of abstraction that can be used in inter/intra disciplinary ways. Again seems like a different path, but is not.
Thursday, January 8, 2009
Semantic Web Services
ontology constructions to create web services. Cardoso has a good article "On the Move to Semantic Web Services" at http://www.waset.org/pwaset/v8/v8-3.pdf . For example, we could compile an ontology to Java classes that generate atomic semantic web services and compose a complex semantic web service.
This implies a two-stage Semantic web service creation:
(1) Extract ontologies to Java APIs
(2) Create semantic web services from Java APIs
In Cardoso's article, he shows how to use WSDL-S to map between WSDL descriptions and ontological concepts. He uses WSDL4J at http://sourceforge.net/projects/wsdl4j/ for WSDL-S document management. Furthermore, the article discusses the use of OWL-S as a web service language for OWL.
In order to compose the web service, logic rules derived from the ontologies can be used to check existence, extract instance, compare values and check constraints. Kalyanpur et al. has a good article on "Automatic Mapping of OWL Ontologies into Java" at http://www.mindswap.org/~aditkal/SEKE04.pdf as well as one by Drumm on "Integrating eGovernment Services uisng Semantic Web Technologies" at http://dip.semanticweb.org/documents/DRUMM-Integrating-eGovernment-Services-using-Semantic-Web-Technologies.pdf. I like the high level architecture of the prototype with (1) presentation layer, (2) Semantic Web Service Layer, (3) Service Abstraction Layer, and (4) Legacy systems. This enables interfacing with backend services that run on different platforms and operate with different technologies.
Another good article for understanding annotation issues is by Patil et. al. on the Meteor-S Web Service Annotation Framework at http://lsdis.cs.uga.edu/lib/download/POSV-WWW2004.pdf . They discuss issues in matching XML schemas to ontologies. I believe it is important to understand the issues with algorithm development and implementation to match and annotate WSDL files for relevant ontologies. What is nice about Meteor-S is that it is an Eclipse plug-in that permits the creation of semantic web services from java source code or already existing services.
Of course, Jena is a semantic web platform for Java. It uses RDF, OWL, SPARQL and has a rule based engine as well as being open source, http://jena.sourceforge.net/. An article with code examples for Jena is at http://www.devx.com/semantic/Article/34968 . Here the author shows how Jean's inference processing works with OWL reasoners and ontology model. There are other reasoning services that one can use such as Pellet at http://clarkparsia.com/pellet.
This is an initial step to develop ontological models for spatial-time series modeling that can be then used across multiple knowledge domains in the form of semantic web services.
Tuesday, January 6, 2009
NeuroImaging Ontologies
What I like about this approach is the use of two independent layers that interdependent. For example, an external public layer that provides ontologies and interfaces and an internal layer that has the intelligence for control and reasoning. This enables the construction of domain models that not only can provide code generation, but also serve as a run-time artifact. This means that these models can be interfaced with other applications as part of the Semantic Web vision and can promote both reuse and inter-operability. Thus, as I continue to develop my ideas in the domain of fMRI with a paper by Nakai et. al. (2008) at http://ipal.i2r.a-star.edu.sg/doc/publications/nakai2008mrms.pdf on "Ontology for fMRI as a Biomedical Informatics Method", I think about how to interface my domain knowledge into this domain quickly and efficiently. In their paper, Table 3 "Initial Proposal of functional magnetic resonance imaging (fMRI) ontology" they provide a beginning point for the development of a Statistical Analysis ontology as a subclass to their Parameter Description class. The statistical analysis subclass would include:
-Statistical Method
-Threshold
-Post-hoc
-Covariance
This serves a useful starting point to define the subclasses for not only the fMRI neuroimaging modality, but other modalities as well. Protégé is excellent for this with its "the SLOT of the CLASS is the FACET" approach and for a programmer such as myself well-versed in the making of classes, APIs, and database schemas, it is easily extended my thoughts to the domain models of my current expertise. Furthermore, the use of web services and the service-oriented computing (SOC) concepts lends itself naturally to this context as mentioned from the previous paper. The extensibility of this idea to other neuroimaging modalities such as magnetoencephalography (MEG), electroencephalography (EEG), and near infrared spectroscopy (NIRS) is important because at increasing levels of abstraction they all need the same analytical methods that are inherently statistical in both space and time. For example, Figure 3 in the paper shows the classification of neuroimages according to these modalities and Figure 5 presents the two data flows: a) Science Flow and b) Clincial Flow. My interest is in the work of (a) through testing research hypotheses about the validity and accuracy of different statistical techniques and paradigms used in the data analysis component.
Ultimately, the choice of what technique to use will have nothing to do with availability, but everything to do with an intelligence engine based on agent technologies that will be able to recognize the characteristics of the data set and apply the correct model for the given situation. This is especially important because of the heterogenous nature of medical data sets. Also, because of the eventual creation of domain drive code at run-time, it is possible for statistical models to be created with numerical approaches that might prove to be intractable from theoretical descriptions. This is a very rich field of research with almost unlimited potential-conditional moments, i.e. mean, variance,skew, etc. of multivariate distributions of four dimensions, i.e. 3 for space and 1 for time, accounting for and correcting multi-dimensional linear or non-linear correlation in higher conditional moments. Why? Because of the needed independence assumption for feature extraction necessary to identify the outcomes of stimulus-response models. A good first step is to clarify the ontology for this subclass and how it can be represented in an architecture as part of the semantic web vision and identify what needs to be done to speed up the delivery of high-quality, applicable statistical techniques that can be placed into current commerical applications. A fun talk and presentation for the conference, and then its off to riding the canal boats, standing in front of the Night Watch, comparing my paintings to Van Gogh and listening to some jazz in Amsterdam!
References
Nakai, T., E. Bagarinao, Y. Tanaka, K. Matsuo, and D. Racoceanu (2008). "Ontology for fMRI as a Biomedical Informatics Method", Magn Reson Med Sci, Volume 7, No. 3, pp.141-155.