Skip to content

📦 nanoFramework System.Text.RegularExpressions Class Library

License

Notifications You must be signed in to change notification settings

nanoframework/System.Text.RegularExpressions

Repository files navigation

Quality Gate Status Reliability Rating License NuGet #yourfirstpr Discord

nanoFramework logo


Welcome to the .NET nanoFramework System.Text.RegularExpressions repository

Build status

Component Build Status NuGet Package
System.Text.RegularExpressions Build Status NuGet

Important: This Regular Expressions parser will cover most of your needs. It has some limitation when the pattern is complex and not a full compatibility. This is an on going work, mainly built on the .NET Microframework implementation. Please do not hesitate to raise any issue if any issue. Also, any help to improve this parser it's more than welcome.

In the Tests you will find advance tests, so far only one is failing. Help to fix the parser needed!

Usage

The level of compatibility with the full framework is high. The Match, Group classes are working as you can expect. The following examples gives an idea of the usage:

// The example displays the following output:
//       Match: This is one sentence.
//          Group 1: 'This is one sentence.'
//             Capture 1: 'This is one sentence.'
//          Group 2: 'sentence'
//             Capture 1: 'This '
//             Capture 2: 'is '
//             Capture 3: 'one '
//             Capture 4: 'sentence'
//          Group 3: 'sentence'
//             Capture 1: 'This'
//             Capture 2: 'is'
//             Capture 3: 'one'
//             Capture 4: 'sentence'
string pattern = @"(\b(\w+?)[,:;]?\s?)+[?.!]";
string input = "This is one sentence. This is a second sentence.";

Match match = Regex.Match(input, pattern);
Debug.WriteLine("Match: " + match.Value);
int groupCtr = 0;
foreach (Group group in match.Groups)
{
    groupCtr++;
    Debug.WriteLine("   Group " + groupCtr + ": '" + group.Value + "'");
    int captureCtr = 0;
    foreach (Capture capture in group.Captures)
    {
        captureCtr++;
        Debug.WriteLine("      Capture " + captureCtr + ": '" + capture.Value + "'");
    }
}

Another example using Split:

regex = new Regex("[ab]+");
acutalResults = regex.Split("xyzzyababbayyzabbbab123");
for (int i = 0; i < acutalResults.Length; i++)
{
    Debug.WriteLine($"{acutalResults[i]}");
}
// The results will be:
// xyzzy
// yyz
// 123

You can as well use the Replace function:

regex = new Regex("a*b");
actual = regex.Replace("aaaabfooaaabgarplyaaabwackyb", "-");
Debug.WriteLine($"{actual}");
regex = new Regex("([a-b]+?)([c-d]+)");
actual = regex.Replace("zzabcdzz", "$1-$2");
Debug.WriteLine($"{actual}");
// The result will be:
// -foo-garply-wacky-
// zzab-cdzz

The next example shows the possibility to use options:

regex = new Regex("abc(\\w*)");
Debug.WriteLine("RegexOptions.IgnoreCase abc(\\w*)");
regex.Options = RegexOptions.IgnoreCase;
if (regex.IsMatch("abcddd"))
{
    Debug.WriteLine("abcddd = true");
}
regex = new Regex("^abc$", RegexOptions.Multiline);
if (regex.IsMatch("\nabc"))
{
    Debug.WriteLine("abc found!");
}
// The result will be:
// abcddd = true
// abc found!

Validated regular expressions

You'll find in the tests some regular expressions used. Those can be useful:

  • email addresses: ([\w\d_.\-]+)@([\d\w\.\-]+)\.([\w\.]{2,5})
  • http(s) URL: (https?:\/\/)([\da-z-._]+)/?([\/\da-z.-]*) (limitation: URL has to finish with a / to be properly extracted, this is a bug into our engine, it works perfectly with the expression between ^ and $)
  • MD5: [a-f0-9]{32}
  • SHA256: [A-Fa-f0-9]{64}
  • Simple XML tag: <tag>[^<]*</tag>
  • GUID: [{]?[0-9a-fA-F]{8}-([0-9a-fA-F]{4}-){3}[0-9a-fA-F]{12}[}]?
  • Date time like 2021-04-10 18:08:42: (\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})

Known limitations

This parser is a simple one, some of those elements are not supported:

  • Expressions like (?<word>\w+) will not work. While groups are supported, the ? in front of a named group or element is not supported.
  • For some characters, when using the escaped version like \. you may encounter issues, just use . instead.
  • Sometimes the order of the characters may have an impact. If you are in this case, try to change the order in a character class like [a-z-._]

Feedback and documentation

For documentation, providing feedback, issues and finding out how to contribute please refer to the Home repo.

Join our Discord community here.

Credits

The list of contributors to this project can be found at CONTRIBUTORS.

License

The nanoFramework Class Libraries are licensed under the MIT license.

Please check the header of the files in this repository, some of the code is under Apache License 2.0.

Code of Conduct

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behaviour in our community. For more information see the .NET Foundation Code of Conduct.

.NET Foundation

This project is supported by the .NET Foundation.