Help | Site Map
Connecting Tech Pros Worldwide
Reply
 
LinkBack Thread Tools
  #1  
Old October 29th, 2007, 08:21 AM
Newbie
 
Join Date: Oct 2007
Posts: 1
Default How to build long Regular Expression

Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready long set of special symbols and impossible to improve such expression.

We have to create ’smart’ regular expression. Instead of write one line expression we prepare multi line text from which we shall generate our long expression. Here is a simple example.

Expand|Select|Wrap|Line Numbers
  1. space                    [\s/-]+
  2. word                     \w+
  3. words                    (?:{word}{space})*?{word}
  4. birthday                 (?<birthday>\d+\.d+\.d+)
  5. title                    {word}\.
  6. name                     {words}
  7. person                   {title}{space}{name}{space}{birthday}
  8.  
This text consist of two columns separated by spaces. First column is pattern name and second column is easy to read regular expression. The resulting regular expression for pattern ‘person’ will be:
Expand|Select|Wrap|Line Numbers
  1. \w+\.[\s/-]+(?:\w+[\s/-]+)*?\w+[\s/-]+(?<birthday>\d+.\d+.\d+)
  2.  
You can do it using following class
Expand|Select|Wrap|Line Numbers
  1. public class Lexer
  2.     {
  3.         private NameValueCollection col;
  4.         public Lexer()
  5.         {
  6.             col = new NameValueCollection();
  7.         }
  8.  
  9.         public static Lexer Create(string resource)
  10.         {
  11.             StringReader sr = new StringReader(resource);
  12.             Lexer lex =new Lexer();
  13.             while (sr.Peek()>=0)
  14.             {
  15.                 string line = sr.ReadLine();
  16.                 Match m = Regex.Match(line,@"([\w_]+)\s+(.*)");
  17.                 if (m.Success) 
  18.                 {
  19.                     lex.col.Add(m.Groups[1].Value.Trim(), m.Groups[2].Value.Trim());
  20.                 }
  21.             }
  22.             sr.Close();
  23.  
  24.             return lex;
  25.         }
  26.  
  27.  
  28.         public string GetExpression(string name)
  29.         {
  30.             if (name == null || name.Length == 0) return string.Empty;
  31.             string res = col[name];
  32.             if (res == null) throw new ArgumentException("Template not found", name);
  33.  
  34.             bool needGroup = res.IndexOf('|') > 0;
  35.             Regex reg = new Regex(@"(?<!\\p){([a-zA-Z][\w_]+)}");
  36.             Match m = reg.Match(res);
  37.             while (m.Success)
  38.             {
  39.                 string token = m.Groups[1].Value;
  40.                 string exp = GetExpression(token); 
  41.                 if (exp != null && exp.Length>0)
  42.                     res = res.Replace(@"{"+token+"}",exp);
  43.                 m = m.NextMatch();
  44.             }
  45.             string result = res;
  46.             if (needGroup)
  47.             {
  48.                 result = "(?:" + res + ")";
  49.             }
  50.             result = "(?#" + name + ")" + result;
  51.  
  52.             return result;
  53.         }
  54.  
  55.     }
  56.  

Then we can create class instance and get regular expression
Expand|Select|Wrap|Line Numbers
  1. Lexer lex = Lexer.Create(txtLexerText.Text);
  2. string expr = lex.GetExpression("person");
  3. Regex reg = new Regex(expr);
  4.  
Reply
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles