Hi, I'm Ray

29 Apr, 2020

C#: Implementing simple wildcard string matching without regular expressions

Programming, .NET, C#

In my previous post I walked-through how to use regex to perform string matching using “*” tokens. In that post I acknowledged that regex was slow and I inferred that native string operations are preferable for better speed.

I’m back to tell you that regex is incredibly-slow. Code is often a trade-off between convenience and performance; regex is a tool of convenience. If your application needs speed above all else, regular expressions is not the solution.

So, let’s revisit the implementation again without regex.

 

Let’s get started.

 

1. Re-use some old code

With reference to the previous post, we had good code at the end of that walk-through.

We’ll keep most of it but replace the regex methods with our own.


public bool IsWildcardMatch(string wildcardPattern, string subject)
{
	if (string.IsNullOrWhiteSpace(wildcardPattern))
	{
		return false;
	}

	int wildcardCount = wildcardPattern.Count(x => x.Equals('*'));
	if (wildcardCount <= 0)
	{
		return subject.Equals(wildcardPattern, StringComparison.CurrentCultureIgnoreCase);
	}
	else if (wildcardCount == 1)
	{
		string newWildcardPattern = wildcardPattern.Replace("*", "");

		if (wildcardPattern.StartsWith("*"))
		{
			return subject.EndsWith(newWildcardPattern, StringComparison.CurrentCultureIgnoreCase);
		}
		else if (wildcardPattern.EndsWith("*"))
		{
			return subject.StartsWith(newWildcardPattern, StringComparison.CurrentCultureIgnoreCase);
		}
		else
		{
			/// todo: replace regex
			// isRegexMatch(wildcardPattern, subject);
		}
	}
	else
	{
		/// todo: replace regex
		// isRegexMatch(wildcardPattern, subject);
	}
}

 

2. Replace regex-match

You can replicate the effect of regex matching by splitting the wildcard pattern and matching substrings. Each substring match progresses a position counter to ensure that subsequent matches are after the previous. This should make sense in the code below.

We also have to match the start and end of the string strictly.


protected bool isRegexMatch(string pattern, string subject)
{
	string[] parts = pattern.Split('*');
	if (parts.Length <= 1)
	{
		return subject.Equals(pattern, StringComparison.CurrentCultureIgnoreCase);
	}

	int pos = 0;

	for (int i = 0; i< parts.Length; i++)
	{
		if (i <= 0)
		{
			// first
			pos = subject.IndexOf(parts[i], pos, StringComparison.CurrentCultureIgnoreCase);
			if (pos != 0)
			{
				return false;
			}
		}
		else if (i >= (parts.Length - 1))
		{
			// last
			if (!subject.EndsWith(parts[i], StringComparison.CurrentCultureIgnoreCase))
			{
				return false;
			}
		}
		else
		{
			pos = subject.IndexOf(parts[i], pos, StringComparison.CurrentCultureIgnoreCase);
			if (pos < 0)
			{
				return false;
			}

			pos += parts[i].Length;
		}
	}

	return true;
}

 

That is it.

I hope someone finds this useful.