Skip to main content

Posts

Showing posts from February, 2020

Fuzzy String Matching on Python

Fuzzy Matching comes into play when simple comparison operators can't distinguish the exact text pattern while removing duplicates. For example : Microsoft Inc. and Microsoft represent the same organization but comparison operator usually fails to get the similarity between those terms.  >> Str1 = "Microsoft Inc." >> Str2 = "Microsoft Inc."  >> Result = Str1 == Str2  >> print(Result) True The above simple comparison might give you result as True since both the strings are matching exactly with no difference in the pattern. If it is misspelled or changed the case, it results False.  To avoid such cases, we can use FuzzyWuzzy package in python to compare and get the similarity between any textual patterns. Fuzzy logic is a methodology to evaluate the "degrees of truth" rather than the usual Boolean logic "true or false" approach. The values of truth may vary between 0 and 1. Fuzzy Strin...

Using Google Connected Sheets

Google has released a beta version of Google Connected Sheets with which you could play around bigger data sets on Google sheets. It makes its analytics datawarehouse BigQuery data available on web based Google sheets. With this new feature, there won't be any row limitations and can import and analyze billions of records on sheets. Google took a leap on UI by introducing Connected Sheets feature, that eventually becomes the face of BigQuery data. Connected sheets provides a new integration between Sheets and BigQuery that is designed to make analysis on data stored in BigQuery very easy, with no knowledge of SQL required. Changes in Sheets You may find : There is one extra option called "Sheets Data Connector" . It enables users to perform import/export of larger data sets up to 10 billion rows of BigQuery data without using any SQL commands/scripts. Other options in sheets remain intact such as formulas,pivot tables and other add-ons. GCP us...