Skip to main content

Fuzzy String Matching on Python


Fuzzy Matching comes into play when simple comparison operators can't distinguish the exact text pattern while removing duplicates. For example : Microsoft Inc. and Microsoft represent the same organization but comparison operator usually fails to get the similarity between those terms. 



>> Str1 = "Microsoft Inc."
>> Str2 = "Microsoft Inc."
 >> Result = Str1 == Str2
 >> print(Result)

True


The above simple comparison might give you result as True since both the strings are matching exactly with no difference in the pattern. If it is misspelled or changed the case, it results False. 

To avoid such cases, we can use FuzzyWuzzy package in python to compare and get the similarity between any textual patterns.

Fuzzy logic is a methodology to evaluate the "degrees of truth" rather than the usual Boolean logic "true or false" approach. The values of truth may vary between 0 and 1.

Fuzzy String Matching applies approximate string matching rather than the exact string matching. 


FuzzyWuzzy package in python uses the popular standard Levenshtein distance ration of similarity between two strings. 

Let's see how it works on jupyter notebook :


Launch a blank jupyter notebook for python and install required packages 



Import the modules of fuzzywuzzy :



We can now start applying the algorithm and get the similarity scores between the strings :



fuzz.partial_ratio() looks for substrings of the larger string in both the strings compared and gets the score. Given two strings X and Y, let the shorter string (X) be of length p. It finds the fuzzy wuzzy ratio similarity measure between the shorter string and every substring of length p of the longer string, and returns the maximum of those similarity measures.



fuzz.token_sort_ratio ignores the order in the strings gives the same score :






fuzz.token_set_ratio gives the same score disregarding the repetition of any token in the string :



based on the type of data set you are using, you can chose the options and apply on the data. Python offers flexible and simple options to handle the text similarity problems.


Comments

Popular posts from this blog

30 Free APIs to Boost Your Productivity

  APIs (Application Programming Interfaces) allow developers to access and integrate the functionality of other software systems into their own applications. In the world of productivity, there are numerous APIs available that can help you streamline your workflows, manage your tasks and projects, and get more done in less time. Here are 30 free productive APIs that you can use to boost your productivity: Google Maps API: This API allows developers to access and customize Google Maps for their own websites and applications. It includes features such as directions, geocoding, and real-time traffic updates. Documentation can be found at https://developers.google.com/maps/ Trello API: This API allows developers to access and manipulate data from Trello, a popular project management and organization tool. It can be used to create, read, update, and delete Trello boards, lists, and cards. Documentation can be found at https://developers.trello.com/ Asana API: This API allows developers...

ChatGPT Prompting Cheat Sheet

10 FREE Datasets to start building your Portfolio

You can use these datasets to perform Data Cleaning, Exploratory Data Analysis (EDA), Forecasting, create Visualizations/Dashboard, identify insights, etc, and add these projects to your portfolio 🤩 📌 1. Supermarket Sales Historical record of sales data in 3 different supermarkets 🌐  Supermarket Sales 📌 2. Credit Card Fraud Detection Anonymized credit card transactions labeled as fraudulent or genuine. 🌐  Credit Card Fraud Detection   📌 3. FIFA 22 complete player dataset The datasets provided include the players data for the Career Mode from FIFA 15 to FIFA 22. The data allows multiple comparisons for the same players across the last 8 version of the videogame. 🌐  FIFA 22 complete player dataset   📌 4. Walmart Store Sales Forecasting Use historical markdown data to predict store sales 🌐   Walmart Store Sales Forecasting   📌 5. Netflix Movies and TV Shows Listings of movies and tv shows on Netflix - Regularly Updated 🌐 ...