Categories
Coding

Don’t forget about character casing when comparing strings!

There are many issues I’ve experienced during the many, many years I’ve worked as a software developer. But one of the most recurring issues is, without a doubt, the mismatching of words due to character casing.

There are solutions to the character casing mismatch problem. For example, you can make your strings all lower case or upper case before comparing them. There are also many programming languages that have features to help with string comparisons.

The issue is no longer the lack of solutions to avoid this problem. The problem is that these solutions require that you, the developer, be proactive by being alert and aware of case sensitivity when making string comparisons. For example, in C#, you have the StringComparer class, which includes properties like StringComparer.OrdinalIgnoreCase to help you ignore the character’s casing when comparing strings.

As a developer, you have to be alert and know when to ignore character casing. While there are many simple ways and tools built into programming languages, sometimes knowing when to do this might not be obvious.

For example, if you call GroupBy in C# and select the value you want your list to be grouped by, it will consider values such as “Abc” and “ABC” as unique, which might not be what you want to do.

In most cases, if you group a list of items by a specific string value, your intention is probably to treat the same values “Abc” and “ABC” as the same. Therefore, you’ll want to ignore the casing as the values are the same in this context.

Issues like the one with GroupBy in C# can go unnoticed until it causes problems. For example, I ran into this issue and didn’t realize the mistake until I tried to add the values of that grouped list to a dictionary and failed. The dictionary attempted to use the values “Abc” and “ABC” as the dictionary key, but it failed since these aren’t unique.

So what can you do about this? Code defensively. Every time you compare strings, consider character casing sensitivity and avoid it easily by converting all your strings to upper case or lower case before comparing. Second, be aware of the use cases where you are calling a built-in function such as GroupBy or ToDictionary as functions like this might be case-sensitive within your programming language.

With the programming language C#, you can use overloads that explicitly specify the string comparison rules for string comparisons. It works in this language by calling a method overload that has a parameter of type StringComparison.

In the example below, I’ll be using StringComparison.OrdinalIgnoreCase for comparisons for culture-agnostic string matching. The example shows you how not ignoring case sensitivity might give you unexpected results.

Examples in C#

Let’s declare a list of books with author names written using different casing

var books = new List<Book>()
{
new Book { Name = "Programa en donde sea", Author = "Ricardo" },
new Book { Name = "Empieza a programar", Author = "ricardo" },
new Book { Name = "Xyz", Author = "Joe" },
new Book { Name = "Despues de la programacion", Author = "RICARDO" },
new Book { Name = "Blah", Author = "Foo" }
};

Let’s group the list of books by Author, but since we are not doing anything to ignore case sensitivity, the result is not what’s expected – It returns five records instead of three as it treats all variations of the name Ricardo as unique values.

var notAUniqueListOfBooks = books.GroupBy(b => b.Author);

Now let’s group the same list of books by author, but this time let’s add a parameter to make the string comparison case insensitive. The result is only three records, that’s because it treats all the variations of the Author name Ricardo as the same value.

var aUniqueListOfBooks = books.GroupBy(b => b.Author, StringComparer.OrdinalIgnoreCase);

Let’s now create a dictionary from the list of books. This dictionary will use the Author value as the key, and both the book’s name and author as the value. The result is five items in the dictionary, again, because it treats the each instance of the author name Ricardo as a unique value due to the difference in casing.

var notAUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b);

Finally, we’ll try to create a dictionary following the same attributes above, but this time, we’ll pass the parameter StringComparer.OrdinalIgnoreCase to make sure the comparison is case insensitive.

The result if this last one is an error with the following message:

“An item with the same key has already been added. Key: ricardo”

This is because since we are ignoring the casing in Author, we cannot create a dictionary as the key values are required to be unique and by ignoring the case of the different variations of the value Ricardo, these are no longer unique. They all end up being the same exact value.

var aUniqueBookDictionary = books.ToDictionary(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

Finally, using the examples above, if you wanted to group by Author, and then create a list of all of their books including the name and author values then you could try using ToLookup, and pass the StringComparer parameter to make sure the string comparison in case insensitive.

var aUniqueLookup = books.ToLookup(b => b.Author, b => b, StringComparer.OrdinalIgnoreCase);

The above will give you a dictionary where the key is the Author name and the value is a list of books including name and author. Also, by passing the StringComparer.OrdinalIgnoreCase parameter, we are making sure that the result is a unique list of values.

This the result of our book list when converted into a Lookup object in C#. There are three keys, all unique, and under each key we have a list of books that corresponds to the book’s author representing the Key value.

I hope this is useful, the code I used to test the examples above is all available here if you want to play with it and explore changing the values, parameters, etc. Cheers and happy coding!

Categories
Coding

Mac OS: xcrun: error: invalid active developer path

While watching one of the Microsoft Build 2021 sessions, I ran into the following error message while trying to clone a project from Github using the git clone command line on my Mac. The entire error message is below:

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

Thankfully, the fix for this is pretty straightforward, install the Xcode toolkit! I had done this before, but for some reason, I had to do it again, and only after doing this, my git commands started to work while executing them in the terminal.

To install the Xcode toolkit, use the following command in your Mac OS terminal:

xcode-select --install

If that doesn’t work, try the following:

sudo xcode-select --install

And if that also doesn’t work, you can try the following next:

sudo xcode-select --reset

If the command line options above don’t work for you, you can download the Xcode toolkit manually from Apple’s Developer download page.

Categories
Coding How-To

How to (efficiently) update millions of records in a SQL table

You need to update one or more columns in a table with millions of records with data from another table, what’s a good way of going this with low risk and in an optimized way? One of my favorite ways of dealing with millions of records in a table is processing inserts, deletes, or updates in batches.

Updating data in batches of 10,000 records at a time and using a transaction is a simple and efficient way of performing updates on millions of records. The key is to batch out your data to allow you to process a smaller number of records at a time.

Example:

use DatabaseName
go

DECLARE @batchId INT
DECLARE @batchSize INT
DECLARE @results INT

SET @results = 1
SET @batchSize = 10000
SET @batchId = 0

-- when 0 rows returned, exit the loop
WHILE (@results > 0)
	BEGIN
		BEGIN TRAN;

		UPDATE Table1 SET columnA = Table2.columnA
		FROM Table2
		INNER JOIN Table1 ON Table2.Id = Table1.Id
		WHERE Table1.columnA is null
		AND (Table2.Id > @batchId
		AND Table2.Id <= @batchId + @batchSize)

		SET @results = @@ROWCOUNT
	
		-- next batch
		SET @batchId = @batchId + @batchSize

		COMMIT TRAN;
	END

-- the sql below is just to measure the performance of this update, it is not needed to update your data.

DECLARE @startTime DATETIME

SELECT  @startTime = GETDATE()
SET STATISTICS PROFILE ON
SELECT  COUNT(*)
FROM Table1 first OPTION (RECOMPILE)
SET STATISTICS PROFILE OFF
SELECT  'It took ' + CONVERT(VARCHAR(8), DATEDIFF(ms, @startTime, GETDATE())) + ' ms'
GO

In the query above we declare the following variables to do the batching:

  • @results – this variable will hold the number of records updated; when zero, the query will stop. This is to avoid an infinite loop.
  • @batchId – this is set to zero initially, and it is used to compare the table id against it, and after each update, it is set to the id plus the batch size. This allows splitting the millions of records into batches.
  • @batchSize – the number of records to update at a time.

It is important to know that the above algorithm will only work when your table’s primary keys are of type int. If your table’s primary keys are guids, the above will not work.

The section at the end of the query is only used to review the performance of your updates, it is not necessary to update the data and it is a nice way to see how the query performs overall.

I hope the information in this post is helpful, and it helps you learn a simple way to update millions of records in batches, making it easier and lower risk than attempting to do it all at once.

Categories
Coding Notes

Enabling software engineering teams for success.

Software development is hard, and it isn’t always the programming language or the framework you use, it’s the people who work on it.

People are an essential part of a team; everything can be easily changed and fixed, but you need to make sure people work well together to achieve effective communication and a great culture. In my software development career of over 20 years, the critical difference between successful software projects and failed ones has been how engineering teams are created and how they are allowed to function over time.

Categories
Coding How-To

Must-have VS Code Settings for Web Development

Last week I wrote about must-have VS Code extensions for web development. This time, I’m writing about the recommended settings to make sure these extensions and VS Code work correctly. 

VS Code is an
excellent tool for writing software, and when you install the right extensions, it can be very powerful. However, if these extensions and VS Code isn’t set up correctly, many of its features or extensions might not work the way you expect them to work.

Below is a list of settings that I use as my base when setting up VS Code for any type of web development. You can also get them from the gist located here:

https://gist.github.com/ricardodsanchez/179f693c8c843abc782e3e40112904aa

The way to get to the setting in VS Code is by typing Ctrl-Shift P and then type “settings”. A settings.json file will open and this is where you can remove all existing settings and paste the settings below.

{
  "telemetry.enableTelemetry": false,
  "telemetry.enableCrashReporter": false,
  "files.autoSave": "afterDelay",
  "files.associations": {
    "*.md": "markdown"
  },
  "window.zoomLevel": 0,
  "[html]": {
    "editor.defaultFormatter": "vscode.html-language-features"
  },
  "breadcrumbs.enabled": true,
  "editor.codeActionsOnSave": {
    "source.fixAll.eslint": true
  },
  "editor.acceptSuggestionOnEnter": "on",
  "editor.detectIndentation": false,
  "editor.fontFamily": "'Source Code Pro', Menlo, Consolas, Courier, monospace",
  "editor.fontSize": 13,
  "editor.defaultFormatter": "esbenp.prettier-vscode",
  "editor.formatOnSave": true,
  "editor.multiCursorModifier": "ctrlCmd",
  "editor.quickSuggestions": {
    "other": true,
    "comments": true,
    "strings": true,
  },
  "editor.renderWhitespace": "none",
  "editor.snippetSuggestions": "top",
  "editor.tabSize": 2,
  "editor.wordWrap": "on",
  "editor.wrappingIndent": "indent",
  "extensions.ignoreRecommendations": true,
  "extensions.showRecommendationsOnlyOnDemand": true,
  "files.exclude": {
    "**/._*": true
  },
  "files.insertFinalNewline": true,
  "html.format.endWithNewline": true,
  "html.format.extraLiners": "body",
  "html.format.indentInnerHtml": true,
  "html.format.unformatted": "b,em,i,span,strong,wbr",
  "html.format.wrapLineLength": 0,
  "liveServer.settings.donotShowInfoMsg": true,
  "liveServer.settings.port": 0,
  "workbench.colorCustomizations": {
    "editorIndentGuide.activeBackground": "#FFA500"
  },
  "workbench.editor.tabSizing": "shrink",
  "workbench.editor.enablePreview": false,
  "workbench.editor.enablePreviewFromQuickOpen": false,
  "gitlens.advanced.messages": {
    "suppressGitDisabledWarning": true
  }
}

That’s all for all folks, remember to subscribe and please let me know if you have any comments or questions below.