All You Need to Know About TF-IDF

What is TF-IDF? The acronym is used for the word ‘Term Frequency-Inverse Document Frequency’. This is a numerical statistic used to evaluate how important a keyword is in your content/page. Simply put, it’s an information retrieval technique that weighs a keyword’s frequency within your content/page.

tf-idf explained

The TF-IDF technique is used to weigh a keyword within the content and tell you the importance of that specific keyword based on the number of times it’s mentioned in the content. Each keyword has its own TF and IDF score and the product of these two will give us the TF-IDF Weight of that keyword. The higher the TF-IDF score (weight), the lower the frequency of the keyword on your page.

Checking the Relevance of a Keyword

This technique checks the how important a keyword is throughout the web and this process is known as Corpus. For the keyword/term (t) in a document (d), the weight (Wt.d) of the keyword (t) is calculated  as follows:

Wt.d=TFt.d log (N/DFt)

Okay don’t panic yet. Let’s break it down:

  • Tft.d is the number of time t appears in the document (d).
  • DFt is the number of documents that have the keyword (t).
  • N is the overall number of documents in the corpus.

Defining Term Frequency (TF) with an Example

For Instance, when a 300 word document has the keyword/term “Casino” 15 times, the TF (term frequency) of the keyword will be calculates as follows:

Tfcasino= 15/300 which is 0.05

Defining Inverse Document Frequency (IDF) with an Example

For example, let’s say the keyword “Casino” appears x times in a 1 million document-sized corpus (that is the entire web).

Now, let’s assume that there are 0.5 million documents that have “Casino” in their content, the IDF will be calculated by total number of documents (1 000 000) divided by the number of documents that actually contain the word “Casino” (500 000):

IDF(casino)=log (1 000 000/500 000) which is 2.

Wcasino=(TF-IDF) casino = 0.05 x 2 = 0.1

The Benefits of Using TF-IDF

  • Easy to calculate.
  • Easy way to extract the most descriptive keywords in a document.
  • Measures the uniqueness and relevance of your content.
  • Improves your rankings on Google.

Here’s one more example to fully understand what TF-IDF is and the benefits it comes with. For example, say you write an article about the ‘side effects of coffee’ – we all know there are plenty right?

Before writing the content, you’ll need to know the topics that are discussed in articles that rank well on Google for the phrase ‘side effects of coffee’. Doing a TF-IDF analysis first will show you how to make your content relevant  for that specific phrase and that you should include subtopics on addiction, headaches, caffeine and its effect on the blood pressure.

Through the TF-IDF exercise, including topics that may have already been deemed important by Google won’t only assist you in creating relevant and good quality content, but will also help you rank better.

How to Optimize your Content Using TF-IDF on Ryte

  1. Sign up and Login (Ryte is 100% Free).

tf-idf step 1

After you fill in the registration forms, and confirm your email address, you’ll be good to go.

2. Click on ‘Content Success’ located on the left-hand side of the screen.

TF-IDF STEP 2

You will then get to this page:

TF-IDF PART STEP 2.

 

  1. Choose your keywords, country and the language you are interested in and then click on ‘Start Content Analysis’.

TF-IDF STEP 3

After a couple of seconds, it will show the results. I used the keyword “Casino”, selected English as the language and the location was New Zealand:

TF-IDF RYT RESULTS

Keyword Recommendations:

TF-IDF ryte

Competition:

TF-IDF Results - Ryte

TF-IDF & Website Auditor

On this tool, we can discover keywords that are inherently related with our target keywords, judging by our top competitors. After opening Website Auditor, head on to ‘Content Analysis’ and then click on TF-IDF. Choose the page you want to optimise and enter your target keyword/s. Here’s what the tool will be doing behind the scenes:

  1. Goes to Google’s search results and selects the 10 top ranking competitors for your target keyword.
  2. Analyses the content of each of the competitors.
  3. Puts up a complete list of words and phrases the competitors use in their content.
  4. Calculates the TF-IDF for each term’s usage on each page, and each term’s average TF-IDF among the 10 pages.
  5. Calculates the TF-IDF for the usage of the same terms on your page.
  6. Builds a table of these keywords and good-looking chart for you to look at.

TF-IDF Website Auditor

The list of terms you see is sorted by the number of competitor pages that use them — this ensures that the most important, relevant terms appear at the top.

The Recommendation column gives you usage advice for each term that appears on the pages of 5 or more of the competitors:

  • Use more if the term’s TF-IDF on your page is below the competitors’ lowest value.
  • Use less if the term’s TF-IDF is above the competitors’ highest value.

You can even make changes to your page and implement these recommendations right in Web Site Auditor by going to Content Editor, where you can edit your content in HTML. Try playing around with the TF-IDF tool yourself in Website Auditor— I promise, you’re in for more than a few exciting discoveries.

N.B

One final word of caution – please don’t take every single recommendation in the TF-IDF dashboard literally.

The algorithm does its part to pick up the best terms for you and give you usage advice; but before you make changes to your page, remember that whatever content you’re adding, it has to offer value to the user.

In other words, don’t try to use this as a way to trick search engines into thinking your page is something it really isn’t; instead, use it as algorithmic inspiration for keyword ideas and improving your content.