Skip to main content

Fuzzy String Matching on Python


Fuzzy Matching comes into play when simple comparison operators can't distinguish the exact text pattern while removing duplicates. For example : Microsoft Inc. and Microsoft represent the same organization but comparison operator usually fails to get the similarity between those terms. 



>> Str1 = "Microsoft Inc."
>> Str2 = "Microsoft Inc."
 >> Result = Str1 == Str2
 >> print(Result)

True


The above simple comparison might give you result as True since both the strings are matching exactly with no difference in the pattern. If it is misspelled or changed the case, it results False. 

To avoid such cases, we can use FuzzyWuzzy package in python to compare and get the similarity between any textual patterns.

Fuzzy logic is a methodology to evaluate the "degrees of truth" rather than the usual Boolean logic "true or false" approach. The values of truth may vary between 0 and 1.

Fuzzy String Matching applies approximate string matching rather than the exact string matching. 


FuzzyWuzzy package in python uses the popular standard Levenshtein distance ration of similarity between two strings. 

Let's see how it works on jupyter notebook :


Launch a blank jupyter notebook for python and install required packages 



Import the modules of fuzzywuzzy :



We can now start applying the algorithm and get the similarity scores between the strings :



fuzz.partial_ratio() looks for substrings of the larger string in both the strings compared and gets the score. Given two strings X and Y, let the shorter string (X) be of length p. It finds the fuzzy wuzzy ratio similarity measure between the shorter string and every substring of length p of the longer string, and returns the maximum of those similarity measures.



fuzz.token_sort_ratio ignores the order in the strings gives the same score :






fuzz.token_set_ratio gives the same score disregarding the repetition of any token in the string :



based on the type of data set you are using, you can chose the options and apply on the data. Python offers flexible and simple options to handle the text similarity problems.


Comments

Popular posts from this blog

ChatGPT Prompting Cheat Sheet

30 Free APIs to Boost Your Productivity

  APIs (Application Programming Interfaces) allow developers to access and integrate the functionality of other software systems into their own applications. In the world of productivity, there are numerous APIs available that can help you streamline your workflows, manage your tasks and projects, and get more done in less time. Here are 30 free productive APIs that you can use to boost your productivity: Google Maps API: This API allows developers to access and customize Google Maps for their own websites and applications. It includes features such as directions, geocoding, and real-time traffic updates. Documentation can be found at https://developers.google.com/maps/ Trello API: This API allows developers to access and manipulate data from Trello, a popular project management and organization tool. It can be used to create, read, update, and delete Trello boards, lists, and cards. Documentation can be found at https://developers.trello.com/ Asana API: This API allows developers...

AI tools other than ChatGPT to improve your productivity

  Everyone's talking about  #ChatGPT . But 90% of you are missing out on the AI revolution. Here are the top AI tools you NEED to know about. 1. Krisp: Krisp's AI removes background voices, noises, and echo from your calls, giving you peace of call Link:  https://krisp.ai/ 2. Beatoven: Create unique royalty-free music that elevates your story Link:  https://www.beatoven.ai/ 3. Cleanvoice: Automatically edit your podcast episodes Link:  https://cleanvoice.ai/ 4. Podcastle: Studio quality recording, right from your computer Link:  https://podcastle.ai/ 5. Flair: Design branded content in a flash Link:  https://flair.ai/ 6. Illustroke: Create killer vector images from text prompts Link:  https://illustroke.com/ 7. Patterned: Generate the exact patterns you need for and design Link:  https://www.patterned.ai/ 8. Stockimg: Generate the perfect stock photo you need, every time Link:  https://stockimg.ai/ 9. Copy: AI Generated copy, that actual...