When it comes to data matching without unique identifier, text-based similarity is widely spread. Comparing texts and finding out which are rather similar is used as a guideline to make matching decisions.
Text-based similarity can be defined in various ways, e.g. counting the number of common letters, counting how many changes are required to transform one term into the other, etc.. And all of these approaches have their strengths and weaknesses depending on the type of text to compare (e.g. single words, whole sentences, technical names, etc.). To name a few of these (distance) measures:
Most of them measure how many changes (operations) are required to transform one text into the other. The decision in favor or against a matching is then based on a threshold value. This is a fair approach but finding the right threshold is not that easy. Additionally, this method lacks any human intelligence which might work even better, depending on the data.
To illustrate when text-based similarity gets beaten by context-driven similarity a few examples will be shown. It is about John Adams and his son John Quincy Adams, 2nd and 6th presidents of the United States of America.
The point here is not to reproach text-based similarity. The purpose is to show that human brain power has its place in the data matching landscape as long as AIs are not smart enough for such general tasks like context-based similarity.
Not all data matching tasks benefit from context-driven matchings, like the example above. But there are cases where no AI (not yet) and no text-distance driven algorithm exceeds what a human brain can achieve.
To make the best out of the human brain power a powerful tool is required. Handling matchings can get confusing quickly as there is an inherent threat to mess up the data. This is typically the issue when people want or need to match data themselves and then start to struggle on how to do it efficiently. One thing is clear: A spreadsheet software is not sufficient to do data matching - neither for text-based nor context-based approaches.