Don't forget about character casing when comparing strings!
There are many issues I’ve experienced during the many, many years I’ve worked as a software developer. But one of the most recurring issues is, without a doubt, the mismatching of words due to character casing.
There are solutions to the character casing mismatch problem. For example, you can make your strings all lower case or upper case before comparing them. There are also many programming languages that have features to help with string comparisons.
The problem when comparing strings and case sensitivity
The issue is no longer the lack of solutions to avoid this problem. The problem is that these solutions require that you, the developer, be proactive by being alert and aware of case sensitivity when making string comparisons. For example, in C#, you have the StringComparer class, which includes properties like StringComparer.OrdinalIgnoreCase to help you ignore the character’s casing when comparing strings.
As a developer, you have to be alert and know when to ignore character casing. While there are many simple ways and tools built into programming languages, sometimes knowing when to do this might not be obvious.
For example, if you call GroupBy in C# and select the value you want your list to be grouped by, it will consider values such as “Abc” and “ABC” as unique, which might not be what you want to do.
In most cases, if you group a list of items by a specific string value, your intention is probably to treat the same values “Abc” and “ABC” as the same. Therefore, you’ll want to ignore the casing as the values are the same in this context.
Issues like the one with GroupBy in C# can go unnoticed until it causes problems. For example, I ran into this issue and didn’t realize the mistake until I tried to add the values of that grouped list to a dictionary and failed. The dictionary attempted to use the values “Abc” and “ABC” as the dictionary key, but it failed since these aren’t unique.
What can you do to avoid casing issues when comparing strings?
So what can you do about this? Code defensively. Every time you compare strings, consider character casing sensitivity and avoid it easily by converting all your strings to upper case or lower case before comparing. Second, be aware of the use cases where you are calling a built-in function such as GroupBy or ToDictionary as functions like this might be case-sensitive within your programming language.
With the programming language C#, you can use overloads that explicitly specify the string comparison rules for string comparisons. It works in this language by calling a method overload that has a parameter of type StringComparison.
In the example below, I’ll be using StringComparison.OrdinalIgnoreCase for comparisons for culture-agnostic string matching. The example shows you how not ignoring case sensitivity might give you unexpected results.
Examples in C#
Let’s declare a list of books with author names written using different casing
var books = new List<Book>()
{
new Book { Name = "Programa en donde sea", Author = "Ricardo" },
new Book { Name = "Empieza a programar", Author = "ricardo" },
new Book { Name = "Xyz", Author = "Joe" },
new Book { Name = "Despues de la programacion", Author = "RICARDO" },
new Book { Name = "Blah", Author = "Foo" }
};
Let’s group the list of books by Author, but since we are not doing anything to ignore case sensitivity, the result is not what’s expected – It returns five records instead of three as it treats all variations of the name Ricardo as unique values.
var notAUniqueListOfBooks = books.GroupBy(b => b.Author);
Now let’s group the same list of books by author, but this time let’s add a parameter to make the string comparison case insensitive. The result is only three records, that’s because it treats all the variations of the Author name Ricardo as the same value.
var aUniqueListOfBooks = books.GroupBy(b => b.Author, StringComparer.OrdinalIgnoreCase);
Let’s now create a dictionary from the list of books. This dictionary will use the Author value as the key, and both the book’s name and author as the value. The result is five items in the dictionary, again, because it treats each instance of the author name Ricardo as a unique value due to the difference in the casing.
var notAUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b);
Finally, we’ll try to create a dictionary following the same attributes above, but this time, we’ll pass the parameter StringComparer.OrdinalIgnoreCase to make sure the comparison is case insensitive.
The result of this last one is an error with the following message:
“An item with the same key has already been added. Key: ricardo”
This is because since we are ignoring the casing in Author, we cannot create a dictionary as the key values are required to be unique and by ignoring the case of the different variations of the value Ricardo, these are no longer unique. They all end up being the same exact value.
var aUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);
Finally, using the examples above, if you wanted to group by Author, and then create a list of all of their books including the name and author values then you could try using ToLookup, and pass the StringComparer parameter to make sure the string comparison in a case insensitive.
var aUniqueLookup = books.ToLookup(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);
The above will give you a dictionary where the key is the Author name and the value is a list of books including name and author. Also, by passing the StringComparer.OrdinalIgnoreCase parameter, we are making sure that the result is a unique list of values.
This is the result of our book list when converted into a Lookup object in C#. There are three keys, all unique, and under each key, we have a list of books that corresponds to the book’s author representing the Key value.
I hope this is useful, the code I used to test the examples above is all available here if you want to play with it and explore changing the values, parameters, etc. Cheers and happy coding!