Categories
Coding

Don’t forget about character casing when comparing strings!

There are many issues I’ve experienced during the many, many years I’ve worked as a software developer. But one of the most recurring issues is, without a doubt, the mismatching of words due to character casing.

There are solutions to the character casing mismatch problem. For example, you can make your strings all lower case or upper case before comparing them. There are also many programming languages that have features to help with string comparisons.

The issue is no longer the lack of solutions to avoid this problem. The problem is that these solutions require that you, the developer, be proactive by being alert and aware of case sensitivity when making string comparisons. For example, in C#, you have the StringComparer class, which includes properties like StringComparer.OrdinalIgnoreCase to help you ignore the character’s casing when comparing strings.

As a developer, you have to be alert and know when to ignore character casing. While there are many simple ways and tools built into programming languages, sometimes knowing when to do this might not be obvious.

For example, if you call GroupBy in C# and select the value you want your list to be grouped by, it will consider values such as “Abc” and “ABC” as unique, which might not be what you want to do.

In most cases, if you group a list of items by a specific string value, your intention is probably to treat the same values “Abc” and “ABC” as the same. Therefore, you’ll want to ignore the casing as the values are the same in this context.

Issues like the one with GroupBy in C# can go unnoticed until it causes problems. For example, I ran into this issue and didn’t realize the mistake until I tried to add the values of that grouped list to a dictionary and failed. The dictionary attempted to use the values “Abc” and “ABC” as the dictionary key, but it failed since these aren’t unique.

So what can you do about this? Code defensively. Every time you compare strings, consider character casing sensitivity and avoid it easily by converting all your strings to upper case or lower case before comparing. Second, be aware of the use cases where you are calling a built-in function such as GroupBy or ToDictionary as functions like this might be case-sensitive within your programming language.

With the programming language C#, you can use overloads that explicitly specify the string comparison rules for string comparisons. It works in this language by calling a method overload that has a parameter of type StringComparison.

In the example below, I’ll be using StringComparison.OrdinalIgnoreCase for comparisons for culture-agnostic string matching. The example shows you how not ignoring case sensitivity might give you unexpected results.

Examples in C#

Let’s declare a list of books with author names written using different casing

var books = new List<Book>()
{
new Book { Name = "Programa en donde sea", Author = "Ricardo" },
new Book { Name = "Empieza a programar", Author = "ricardo" },
new Book { Name = "Xyz", Author = "Joe" },
new Book { Name = "Despues de la programacion", Author = "RICARDO" },
new Book { Name = "Blah", Author = "Foo" }
};

Let’s group the list of books by Author, but since we are not doing anything to ignore case sensitivity, the result is not what’s expected – It returns five records instead of three as it treats all variations of the name Ricardo as unique values.

var notAUniqueListOfBooks = books.GroupBy(b => b.Author);

Now let’s group the same list of books by author, but this time let’s add a parameter to make the string comparison case insensitive. The result is only three records, that’s because it treats all the variations of the Author name Ricardo as the same value.

var aUniqueListOfBooks = books.GroupBy(b => b.Author, StringComparer.OrdinalIgnoreCase);

Let’s now create a dictionary from the list of books. This dictionary will use the Author value as the key, and both the book’s name and author as the value. The result is five items in the dictionary, again, because it treats the each instance of the author name Ricardo as a unique value due to the difference in casing.

var notAUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b);

Finally, we’ll try to create a dictionary following the same attributes above, but this time, we’ll pass the parameter StringComparer.OrdinalIgnoreCase to make sure the comparison is case insensitive.

The result if this last one is an error with the following message:

“An item with the same key has already been added. Key: ricardo”

This is because since we are ignoring the casing in Author, we cannot create a dictionary as the key values are required to be unique and by ignoring the case of the different variations of the value Ricardo, these are no longer unique. They all end up being the same exact value.

var aUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

Finally, using the examples above, if you wanted to group by Author, and then create a list of all of their books including the name and author values then you could try using ToLookup, and pass the StringComparer parameter to make sure the string comparison in case insensitive.

var aUniqueLookup = books.ToLookup(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

The above will give you a dictionary where the key is the Author name and the value is a list of books including name and author. Also, by passing the StringComparer.OrdinalIgnoreCase parameter, we are making sure that the result is a unique list of values.

This the result of our book list when converted into a Lookup object in C#. There are three keys, all unique, and under each key we have a list of books that corresponds to the book’s author representing the Key value.

I hope this is useful, the code I used to test the examples above is all available here if you want to play with it and explore changing the values, parameters, etc. Cheers and happy coding!

Categories
Coding

Parallelism. Using Parallel.For and ConcurrentBag.

Parallelism refers to the technique of running multiple calculations at the same time to speed up a computer program. Historically, this has been a complicated thing to write requiring a developer to do complicated coding including low-level manipulation of threads and locks.

A program will generally run faster if you allow it to execute multiple calculations at the same time. For example, you might have a program where you need check how many orders a customer has, and instead of looping through each customer to check on their orders, you could check on multiple customers at the same time by using something like Parallel.For.

Code example:

private IEnumerable<Orders> MyMethod(List<Orders> orders)
    {
        // Converting the List<Orders> to ConcurrentBag for thread-safe purposes.
        var result = new ConcurrentBag<Orders>();

        Parallel.ForEach(orders, item =>
        {
           // Some data manipulation
           result.Add(new Orders(/* constructor parameters */);
        });

        return result;
    }

The .NET Framework makes writing parallel code a much simpler task than before. A variety of enhancements and additions such as runtime, class library types, and diagnostic tools were introduced with the .NET Framework 4.0 to help developers write safe and efficient parallel code.

Below are some of these tools and enhancements, you can click any of the links for access to Microsoft’s documentation for each one of these:

The benefits

The benefit of using parallel programming is gaining the advantage to execute multiple instructions at the same time. This offers the benefit of making your program faster by reducing the time for the same code to execute sequentially. While this is a great way to speed up your code, you should still consider other ideas as well and not use the framework features around parallelism before knowing more about it. Believe, I know by personal experience, unfortunately.

The disadvantages

The disadvantages of using parallel coding are the increase of use of CPU for it (something to be aware of) and also the potential for issues when using collection objects that aren’t thread-safe. Thread safe means multiple threads can access the common data without any problem. When using something like Parallel.For you want to use a thread-safe object such as ConcurrentBag<T>. Bags are useful for storing objects when ordering doesn’t matter, and unlike sets, bags support duplicates. If you need your collection to be ordered, remember to sort it after converting it to a List<>.

As with everything else, test your code and find out if using the Parallel library or PLINQ in your existing scenario is the right thing for it or not. While it might seem that running things in parallel will always be faster, this isn’t always true. Read more about it here.

Happy coding!

Categories
Coding

An introduction to Single Responsibility principle (SRP)

This is the fifth and last article describing SOLID principles. This article is about the Single Responsibility principle. Hopefully it will help you understand what the principle is all about and why it’s important to keep it in mind when designing and writing your code.

What is the Single Responsibility principle?

Here is the definition from Wikipedia: The term was introduced by Robert C. Martin in an article by the same name as part of his Principles of Object Oriented Design, made popular by his book Agile Software Development, Principles, Patterns, and Practices. Martin described it as being based on the principle of cohesion, as described by Tom DeMarco in his book Structured Analysis and Systems Specification.

Categories
Coding

An introduction to Open Closed principle (OCP)

This is the fourth article on SOLID principles which I started a few weeks ago. I hope this is useful for you and that it gives you a simple understanding of what the Open/Closed principle is all about.

What is the Open Closed principle?

Bertrand Meyer coined the term Open/Closed Principle which appeared in his book titled Object Oriented Software Construction in 1988. The principle reads “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification“.

Categories
Coding

An introduction to Liskov substitution principle (LSP)

This post is about the Liskov substitution principle (LSP). This is also the third of my posts about SOLID principles, following the posts I wrote about DI and ISP in the past few weeks.

What is Liskov substitution principle (LSP)?

This principle is based on Barbara Liskov’s definition of subtyping, commonly known as the Liskov substitution principle which states the following:

if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program

or in my own words, in programming we cannot always represent our objects with real-life objects and so we need to make sure subtypes respect their parents. Using the illustration below, in order to follow this principle we need to make sure that the subtypes (duck, cuckoo, ostrich) respect the parent class (bird). This means that in our code, we should be able to replace bird with duck, cuckoo or ostrich.

The above illustration shows clearly how in object-oriented programming (OOP) we can reuse some of our classes by making use of inheritance. This also shows very simply how using a base/parent class can be very beneficial and it is a main part of OOP.

So what is the problem you might ask? what is the purpose of the Liskov substitution principle? the purpose of it is to help you avoid some problems when modeling your objects by making you aware of potential problems that aren’t so obvious when using inheritance at the time of development.

The idea is to keep the LSP in mind when developing our classes so a parent class like “bird” can point to any of its child class objects i.e. “duck”, “cuckoo” or “ostrich” during runtime without any issues.

A classic example of LSP violation

One of the most classic examples of LSP violation is the use of a Rectangle class as the parent of a Square class. At first it seems this is the right thing to do as in Mathematics a square is a rectangle. However, in code this is not always true and this is what I meant when I wrote above that you cannot model your objects in code as you would do in the real world.

Below is an example of how in code, something like deriving a class of type Square from a Rectangle class might seem ideal…

public class Rectangle
 {
    public double Height { get; set; }
    public double Width { get; set; }
 }

 public class Square : Rectangle
 {
    public Square CreateSquare(double w)
    {
       var newSquare = new Square
       {
          Height = w, Width = w
       };
       return newSquare;
    }
 }

The example above looks fine, you can call the method CreateSquare by passing a value which is then assigned to both the height and width values of the object, this results in a proper formed square where all sides are equal. The problem arises when you define a rectangle where the height and width have different values, if you then try to substitute that object using a Square object… you will get unexpected results as the Square object expects its height and width properties to have the same value all the time.

However, the Liskov substitution principle says that you should be able to substitute any of the child classes for its base class and the example of the rectangle/square clearly breaks that rule.

The Liskov Susbtitution Principle (LSP) is one of the SOLID principles which can help you during software development to avoid common mistakes that are hard to notice if you are not thinking about them when using inheritance and modeling your classes/objects.

Happy coding!