Single Blog

Japanese Text and Yoast SEO

15 June, 2017, Written by 0 comment

A potential client recently asked us whether Yoast SEO plugin handles Japanese text. The WordPress Yoast Plugin has over 3 million active installs and is widely regarded by  developers as the Gold Standard in WordPress SEO plugin support. While there does appear to be an issue with getting an accurate word count through the plugin, this isn’t really a major issue as we will explain in this article. This topic is also a good place to look into a related issue of Japanese text readability for SEO and general web strategy.

What’s the Issue with Japanese text?

Most word counting tools are designed to look for spaces between characters. That’s how we define words in Latin based forms of writing. The problem with applying this to Japanese is that most sentences use a concatenated series of  kanji, hiragana, katakana and romaji symbols and letters. For example the phrase:

The quick brown fox jumps over the lazy dog.

is recognized as 9 words.

However the equivalent sentence in Japanese:

すばしっこい茶色の狐はのろまな犬を飛び越える。

is recognized as a single long word!

Does Word Count Matter?

The Yoast plugin gives a real time count of how many words are in any given blog post or page. The reasons are probably connected to SEO folklore about optimum “keyword density” percentages.

If keyword density is a new term for you it works like this. Imagine you have a blog post about “blue widgets” that is 500 words long and your keyword phrase “blue widgets” appears in the article 5 times. In this example your keyword density would be 1%. Generally you want to have your keyword phrases appearing more than once but search engines like Google worked out a long time ago that “more” doesn’t equal “better”. If your blog article uses a keyword phrase repetitiously  then there is a good chance that you are writing to game the system (a.k.a. “keyword stuffing”) rather than writing for human readers. If so, your article will receive an SEO penalty.

To put it another way, there is a law of diminishing returns on keyword density. After you mention your target phrase a few times you don’t really get much additional benefit from using it over again. At a certain point you actually get diminishing returns as negative penalties start to apply.

Google’s own Search Guru Matt Cutts explains the issue well.

There is some evidence that Google is now favoring longer, “content rich” articles than they might have been in the past as part of their ranking formula. Having a word count tool might be useful from that perspective but once again, to the extent that word count is a actually a factor in search algorithms, the impact is likely small. The evidence suggests a lot of variability across different types of articles and it might simply be a case that longer articles attract more organic inbound links from interested bloggers.

Should I Use Yoast Anyway?

In a word – YES!

The Yoast SEO plugin for WordPress can be configured to do a whole lot of things besides being just a glorified word count tool.

Here are a few, in no particular order:

  • Google search result snippet preview, so you can see exactly how your content will look when someone searches for it on Google.
  • Google Authorship verification for your website. Google Authorship is a way to link content you create with a Google+ profile – easily promoting your blog or latest news about your company and/or products and services you’re an expert at.
  • Synchronization with Google Webmaster tools.
  • Automatically generate an XML sitemap file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content and what pages to view for specific information.
  • Advanced indexing configuration such as removing a specific post, page, post type, or a taxonomy from the sitemap and thus avoid “duplicate content” penalties.
  • Social integration that lets you show you the correct title, description, and image for Facebook using the OpenGraph meta data.
  • Allows adding Twitter cards in WordPress.
  • Easy Meta Robots configuration
  • Improved canonical URL support, adding canonical to taxonomy archives, single posts and pages and the front page.
  • Automated internal linking suggestions (premium version)

The list is extensive.

What is Readability and Why Does it Matter

In addition to all the “technical” SEO tweaking the Yoast plugin does a great job of encouraging text readability i.e. content that is easy for visitors to read and comprehend.

But hang on a minute. “What has that got to do with SEO,” you might be asking.

Is “readability” one of the factors of the Google ranking algorithm?

  • Directly – perhaps not but …
  • Indirectly – almost certainly

We know that between 2010 and 2015 Google actually had “readability” as a little know filter feature of their search interface. It used to be possible to filter search results by basic, intermediate and advanced reading levels.

Furthermore, Google is a supporter of the Web Content Accessibility Guidelines (WCAG 2.0) initiative which includes the following recommendation:

3.1.5 Reading Level: When text requires reading ability more advanced than the lower secondary education level after removal of proper names and titles, supplemental content, or a version that does not require reading ability more advanced than the lower secondary education level, is available. (Level AAA)

So it seems at least possible that Google is giving some small boost to content that meets the lower secondary education level readability guideline. Unfortunately there isn’t much empirical evidence to test the proposition. In fact web accessibility consultant Karl Groves has stated “Better SEO is not an accessibility business case and this myth needs to go away”.

So if the direct SEO benefit of having highly “readable” content is arguable, what about the indirect benefit?

This is more tangible. We know Google wants to serve up search result content that users will actually READ, not just skim for a few seconds then bounce away. From this we can infer that the extent of user engagement with your content (“dwell time”) is at least to some extent a ranking factor in their organic search formula.

Furthermore, if visitors find content easy to read, they will likely share it online as well. This leads to more inbound links, more visitors, more sharing – a virtuous circle.

Last but not least, never lose sight of the fact that search engine results are just a means to end – getting visitors to take some kind of conversion action on your site. That conversion action might be buying something, becoming a member, requesting a quote or signing up for a newsletter. You are not likely to achieve any of those goals if your text is mostly turgid and dense reading.

How is a Readability Score Calculated

Most readability tests use a formula that extracts and compares at least some of the following:

  • Number of words
  • Number of unique words
  • Number of difficult words
  • Number of easy words
  • Number of short words
  • Number of long words
  • Number of sentences
  • Number of syllables
  • Number of monosyllabic words (1 syllable)
  • Number of polysyllabic words (3 syllables or more)

The Yoast plugin uses the Flesch Reading Ease Score but there are others such as:

  • Dale-Chall Readability Index
  • Coleman-Liau Index
  • Gunning Fog Index
Flesch Reading Ease Test Formula

Flesch Reading Ease Test Formula

Japanese Text Readability Scores

Various researchers have attempted to apply similar techniques to Japanese writing in order to come up with a similar objective way of measuring the readability of Japanese texts. In order to solve the issue of undefined word boundaries within sentences, Japanese academics such as Prof. Yoshihiko Hayashi (1992) proposed a formula based on calculating the proportions of Kanji characters, hiragana, katakana, romaji and other symbols in a text as a proxy for readability.

More recent work by Satoshi Sato, Suguru Matsuyoshi, Yohsuke Kondoh (2008) Nagoya University tackles the issue from a more novel angle. In their research they compiled a Textbook Corpus of 1,478 sample passages of Japanese text extracted from 127 textbooks of elementary school, junior high school, high school, and university. From this data they were able to calculate the probability of any given kanji character appearing at a given educational reading level i.e. rarely used character symbols are unlikely to appear in lower class level materials. Later expansion of their work by Sato (2014) expanded this work from single kanji character (unigram) model to two character (bigram) model with improved accuracy in predictive results.

The outcome of Sato’s work is this brilliant online tool that can be used to test the Japanese Readability Score of any article or post.

You’ll need to copy paste the text into the box and click the button on the top left for the results (top right clears the box).

 

 

Japanese News

News story about Cameron Diaz

Satoshi Sato Nagoya University

Posted into the Sato Readability Tool

 

The Cameron Diaz news story has a readability level of 9 (3rd year of Junior High) based on the T13 educational level scale. 1 = 1st Year Elementary Level 2 = 2nd Year Elementary Level 3 = 3rd Year Elementary Level 4 = 4th Year Elementary Level 5 = 5th Year Elementary Level 6 = 6th Year Elementary Level 7 = 1st Year Junior High Level 8 = 2nd Year Junior High Level 9 = 3rd Year Junior High Level 10 = 1st Year Senior High Level 11 = 2nd Year Senior High Level 12 = 3rd Year Senior High Level 13 = University Level

 

The story has a relative reading level of 4 (somewhat easy) on the B9 Comparative Rating scale. 1 = Very Easy 2= Easy 3 = Quite Easy 4 = Somewhat Easy 5 = Average 6 = Somewhat Difficult 7 = Quite Difficult 8 = Difficult 9 = Very Difficult

Readability is an important topic for content creators. Although there are not many tools available for checking readability of Japanese language text, the Satoshi Sato Nagoya University tool helps fills the gap.