Using DirectoryInfo.GetFiles returns more files than expected (or how to get exactly what you need, with an exact extension match lookup)

Introduction

I didn’t noticed this behavior of the GetFiles() method until now, I must admit. It’s something not frequent to see, but might happen. And it’s dangerous.
As this post, and the MSDN library itself state, when you use the GetFiles() method with a search wildcard that includes the asterisk symbol, and you include a 3 characters long extension (like *.xml, or *.jpg), the GetFiles() method will return any file whose extension STARTS with the one you provided. That means that a search for *.jpg will return anything with extensions like: *.jpg, *.jpg2, *.jpegfileformat, etc.
This is a quite weird behavior (and not too elegant, I should say), introduced to support the 8.3 file name format. As stated in the above mentioned blog:
“A file with the name “alongfilename.longextension” has an equivalent 8.3 filename of “along~1.lon”. If we filter the extensions “.lon”, then the above 8.3 filename will be a match.”
That’s the reason to make the GetFiles() method behave that way. The official MSDN explanation:
Note
When using the asterisk wildcard character in a searchPattern (for example, "*.txt"), the matching behavior varies depending on the length of the specified file extension. A searchPattern with a file extension of exactly three characters returns files with an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files with extensions of exactly that length that match the file extension specified in the searchPattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files in a directory, "file1.txt" and "file1.txtother", a search pattern of "file?.txt" returns only the first file, while a search pattern of "file*.txt" returns both files.
In my case, I had a bug in my software because I temporally renamed an XML file to xxx.XML2222, just to wipe it out of the application. The program was still reading it, what made it had a wrong behavior.

A workaround for this issue

If you want to prevent this behavior, you will need to do a manual check for the returned array of FileInfo classes, to remove those not matching your pattern. An elegant way to do so, is to write a MethodExtender to the DirectoryInfo class, like the following one:
/// <summary>
/// Returns array of files that matches the search wildcard, but with an exact match for the extension.
/// </summary>
/// <param name="pSearchWildcard">Search wildcard, in the format: *.xml or file?.dat</param>
/// <returns>Array of FileInfo classes</returns>
public static FileInfo[] GetFilesByExactMatchExtension(this DirectoryInfo dinfo, string pSearchWildcard)
{
         FileInfo[] files = dinfo.GetFiles(pSearchWildcard);
         if (files.Length == 0)
             return files;
 
         string extensionSearch = Path.GetExtension(pSearchWildcard).ToLowerInvariant();
         List<FileInfo> filtered = new List<FileInfo>();
         foreach (FileInfo finfo in files)
         {
             if (finfo.Extension.ToLowerInvariant() != extensionSearch)
                 continue;
             filtered.Add(finfo);
         }
         return filtered.ToArray();
}
This way, just by the regular GetFiles() method of the DirectoryInfo class, you will find now the brand new GetFilesByExactMatchExtension(), which will have the desired behavior.
Note: In order to be able to use this method in a class, just like any other MethodExtender, you will need to include a “Using” statement to the extension method’s namespace.
Hope it helps !