Skip to main content

Fuzzy String Matching on Python


Fuzzy Matching comes into play when simple comparison operators can't distinguish the exact text pattern while removing duplicates. For example : Microsoft Inc. and Microsoft represent the same organization but comparison operator usually fails to get the similarity between those terms. 



>> Str1 = "Microsoft Inc."
>> Str2 = "Microsoft Inc."
 >> Result = Str1 == Str2
 >> print(Result)

True


The above simple comparison might give you result as True since both the strings are matching exactly with no difference in the pattern. If it is misspelled or changed the case, it results False. 

To avoid such cases, we can use FuzzyWuzzy package in python to compare and get the similarity between any textual patterns.

Fuzzy logic is a methodology to evaluate the "degrees of truth" rather than the usual Boolean logic "true or false" approach. The values of truth may vary between 0 and 1.

Fuzzy String Matching applies approximate string matching rather than the exact string matching. 


FuzzyWuzzy package in python uses the popular standard Levenshtein distance ration of similarity between two strings. 

Let's see how it works on jupyter notebook :


Launch a blank jupyter notebook for python and install required packages 



Import the modules of fuzzywuzzy :



We can now start applying the algorithm and get the similarity scores between the strings :



fuzz.partial_ratio() looks for substrings of the larger string in both the strings compared and gets the score. Given two strings X and Y, let the shorter string (X) be of length p. It finds the fuzzy wuzzy ratio similarity measure between the shorter string and every substring of length p of the longer string, and returns the maximum of those similarity measures.



fuzz.token_sort_ratio ignores the order in the strings gives the same score :






fuzz.token_set_ratio gives the same score disregarding the repetition of any token in the string :



based on the type of data set you are using, you can chose the options and apply on the data. Python offers flexible and simple options to handle the text similarity problems.


Comments

Popular posts from this blog

ChatGPT Prompting Cheat Sheet

30 Free APIs to Boost Your Productivity

  APIs (Application Programming Interfaces) allow developers to access and integrate the functionality of other software systems into their own applications. In the world of productivity, there are numerous APIs available that can help you streamline your workflows, manage your tasks and projects, and get more done in less time. Here are 30 free productive APIs that you can use to boost your productivity: Google Maps API: This API allows developers to access and customize Google Maps for their own websites and applications. It includes features such as directions, geocoding, and real-time traffic updates. Documentation can be found at https://developers.google.com/maps/ Trello API: This API allows developers to access and manipulate data from Trello, a popular project management and organization tool. It can be used to create, read, update, and delete Trello boards, lists, and cards. Documentation can be found at https://developers.trello.com/ Asana API: This API allows developers...

How To Download Popular e-books for free?

 T here are many reading enthusiasts out there who crave the daily dose of reading. Reading is the best form of escape from bitter and hectic lives. Reading stimulates mental activity and helps in pacifying many mental disorders. It keeps the brain active and makes it retain power and capacity.  Here are some benefits of reading : Mental Wellness Reduces stress tremendously Improves knowledge and turns you into an SME (subject matter expert) by touching upon the length and breadth of a topic. Exhaustive vocabulary expansion Improves focus and concentration Steps to download free PDFs : Go to oiipdf.com You may use free text search or go for the alphabetic search given on the home page Click on your favourite book Click on Download PDF button  Clear the captch as shown in the screenshot  Click on Download and Voila!! you have the pdf copy ready .