Today I learned SQL Server’s default encoding automatically equates similar characters to each other for sorting/searching. Normally that’s a good thing (e.g. so c and รง sort the same way) but today I wanted to find a specific unicode character…
I needed to find (and replace) all “Narrow Non-Break Space” characters but searching for LIKE '%' + NCHAR(8239) + '%'
was also finding ordinary spaces.
The answer was to tell it to use a binary comparison when evaluating that field with COLLATE Latin1_General_BIN
:
SELECT *
FROM Comments
WHERE Comment COLLATE Latin1_General_BIN LIKE '%' + NCHAR(8239) + '%'
to find them and:
UPDATE Comments
SET Comment = REPLACE(
Comment COLLATE Latin1_General_BIN,
NCHAR(8239) COLLATE Latin1_General_BIN,
' '
)
to replace them with regular spaces.