Lysle (2008) has an article on parsing that I can use for these purposes at http://www.csharpcorner.com/UploadFile/scottlysle/ParseSentencesCS02112008055809AM/ParseSentencesCS.aspx. Here is the user interface in Figure 1 with the sample application in C#.
Figure 1. Parsing Interface

The author illustrates three methods from the article:
- Parse Reasonable: Split the text using typical sentence terminations and keep the sentence termination.
- Parse Best: Split the text based upon the use of a regular expressions
- Parse Without Endings: Split the text without sentence terminations.
Of course, I would like to be able to copy and paste the text from the clipboard. Here is an example of the two methods for the controls btnCopy and btnPaste.
private void btnCopy_Click(object sender, EventArgs e)
{
Clipboard.SetText(txtBoxA.Text);
}
//paste the text
private void btnPaste_Click(object sender, EventArgs e)
{
txtBoxB.Text = Clipboard.GetText();
}
Of course, we can also parse to XML with the following code by Cochran at http://www.csharpcorner.com/UploadFile/rmcochran/FlatFileToXmlDocument06302007111353AM/FlatFileToXmlDocument.aspx . Here is the example,
using System;
using System.Collections.Generic;
using System.Text;using System.Xml;
using System.Text.RegularExpressions;
using System.IO;
namespace FlatFileParser
{
public static class Parser
{
#region Member Variables
private const string
strComma = ",",
strTemporaryPlaceholder = "~~`~~",
strTab = "\t";
private static readonly Regex
m_commaFixer = new Regex("\".*?,.*?\"", RegexOptions.Compiled), m_quotesOnBothEnds = new Regex("^\".*\"$", RegexOptions.Compiled);
#endregion
#region Methods
public static XmlDocument ParseTabToXml(string input, string topElementName, string recordElementName,
params string[] recordItemElementName)
{
XmlDocument doc = ParseToXml(input, new char[] { strTab[0] }, topElementName, recordElementName,
recordItemElementName);
PostProcess(doc, PostProcessTabNode);
return doc;
}
public static XmlDocument ParseCsvToXml(string input, string topElementName, string recordElementName,
params string[] recordItemElementName)
{
input = PreProcessCSV(input);
XmlDocument doc = ParseToXml(input, new char[] { strComma[0] }, topElementName, recordElementName,
recordItemElementName);
PostProcess(doc, PostProcessCsvNode); return doc;
}
#endregion
#region Utility Methods
private static XmlDocument ParseToXml(string input, char[] seperator, string topElementName, string recordElementName, string[]recordItemElementName)
{
string[][] data = Dissasemble(input, seperator);
return BuildDocument(data, topElementName, recordElementName, recordItemElementName);
}
private static string PreProcessCSV(string input)
{
MatchCollection collection = m_commaFixer.Matches(input);
foreach (Match m in collection) input = input.Replace( m.Value, m.Value.Substring(1, m.Value.Length - 2).Replace(strComma, strTemporaryPlaceholder));
return input;
}
private static void PostProcess(XmlNode node, Action
{
process(node);
foreach (XmlNode subNode in node.ChildNodes)
PostProcess(subNode, process);
}
private static void PostProcessTabNode(XmlNode node)
{
if (!String.IsNullOrEmpty(node.Value) && m_quotesOnBothEnds.IsMatch(node.Value)) node.Value = node.Value.Substring(1, node.Value.Length - 2);
}
private static void PostProcessCsvNode(XmlNode node)
{
if(! String.IsNullOrEmpty(node.Value))
node.Value = node.Value.Replace(strTemporaryPlaceholder, strComma);
}
Using these building blocks, one can build an application to accomplish the ideas presented above.
No comments:
Post a Comment