SEO (with tips for MediaWiki and WordPress)

Wiki.TerraBase.info
Revision as of 10:46, 23 January 2020 by Root (talk | contribs)
Jump to navigation Jump to search

There are so many articles, book, software, etc. devoted to SEO (Search Engine Optimization). While there a lot of great information, the vast majority of content on the subject seems to be devoted to the last 5% / fine tuning of a website. What about the other 95% of SEO, like the basics. It ain't glamorous, but it is important.

And don't forget about CMS (Content Management Systems) that allow for easy publishing of websites. How does one control and make changes to the items Search Engines deem important? Should it all be left up to software to automatically manage that? Nope. So again, how is that controlled and manipulated?

An element of the internet that is missing today is some Google Sized Directory Service (not a Search Engine), but a Directory Service that is maintained by people and has a certain amount of bias towards high quality web sites. ODP (Open Directory Project) is about all that's left of this, although it could be argued that Wikipedia is a reflection of this.

Some topics covered in this article;

  • SEO (Search Engine Optimization)
  • Tips for MediaWiki and WordPress related to SEO
  • SiteMaps
  • Robots (as in GoogleBot)
  • Google Search Console (https://en.wikipedia.org/wiki/Google_Search_Console)
  • "Web master" tools for other search sites (Bing, etc.)
  • Other Google Services (Google Analytics, Google Insights (merged into Google Trends))

Basics

There are some basics that apply to every website and web page. First CONTENT!!! Notice the emphasis on that. If the purpose of your website is to garner traffic for the sake of traffic or selling advertising space, then fork you (Watch The Good Place on NBC and you'll understand that term if you don't already). For all the other people on the planet who are trying to create or write something useful or artistic, you're probably already doing the right thing and there doesn't need to be any further explanation. Although I would point out, spelling, grammar, etc. are important too.

Beyond that, why not start with the king of search, Google, and find out what they have to say on the subject. After all, they are the search engine. So for right now, forget about all of those books that have some interesting information on the last five percent of important things to do for SEO and focus on the first 95%.

  • Quality Content. It's self explanatory.
  • Title Element for each page: Make it accurate and unique for each page, but not long winded.
  • Meta Element, Description Attribute: A simple, concise sentence that summarizes a web page. Avoid the same description on every page.
  • Per many sources (noted in the Wikipedia article on the Meta Element), the Keyword Attribute is almost completely ignored since the late noughties.
  • Additional "Tips of the Current Era": Look at extensions for MediaWiki, like WikiSEO (discussed later), and the different settings it has. These are some fairly clear indications as to what is important to SEO.

MediaWiki

HTML Meta and Title Extension

To change the <Title> Element, add this extension: Add HTML Meta and Title

Using the "Magic Word" PageName along with the $wgSiteName Variable (in LocalSettings.php), one can keep the automatically generated Title Name and add to it;

<seo title="{{PAGENAME}} - WhatEverAddtionalText" metakeywords="WhatEverKeyWords,AnotherKeyWord,Etc." metadescription="WhatEverDescription,AnotherDescription,Etc." />

Tried different Meta Attributes (see W3schools for more) and none of them worked. Even tried it without "meta" prepending the actual attribute name. google-site-verification is another item that does work. WikiSEO is a more advanced extension that does a bunch more.

Not so much for SEO, but nice as it is displayed on every page is the "Tag Line" (Example on the Wikipedia Site:From Wikipedia, the free encyclopedia) is the "Special Page", MediaWiki:Tagline (by default contains: "From {{SiteName}}", SiteName being a "Magic Word" for the name of the website) can be edited to anything (although "Create Source" must be selected first before it can be edited).

WikiSEO Extension

WikiSEO Extension:

MediaWiki

SiteMap Script and Cron or a System Timer

Friendly URL

First, MediaWiki has some advice on short URLs here: https://www.mediawiki.org/wiki/Manual:Short_URL

They also have a warning about putting MediaWiki in the root of a website or virtual host as opposed to a sub directory (IE https://MyWiki.com VS https://MyWiki.com/WikiFolder, they recommend the latter). The explanation they give is here: https://www.mediawiki.org/wiki/Manual:Wiki_in_site_root_directory, and I must say they make it a bit more threatening than it needs to be, at least as far as not being able to create articles with certain names (which isn't obvious in the way they word it). Although, a workaround I had to use to get Apache mod_rewrites to work sort of points to something deeper that relies on what they're recommending. Plus Wikipedia uses that format too, so why spit into the wind?

Unlike WordPress, there are no extensions that change the typical "index.php" URL that MediaWiki displays. Well, there are (ShortURL, URLShortener, Surl, etc.), but none of them do it without the need to add code to an Apache configuration file or .htaccess file (look them up). So why not just do it without the extensions (as they don't seem to make it easy like the WordPress extensions)? Turns out it is fairly easy. A great tool to make it easier is this one:MediaWiki ShortURL Builder But a word of caution if a MediaWiki site is "private". That tool will not be able to access the section of the wiki necessary to create the proper script, so make sure this setting is set correctly: $wgGroupPermissions['*']['read'] = true; (change it back to false to make it a private wiki.

In an Apache configuration file (making sure that Apache has that module available) (inside a VirtualHost Directive);

RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^(.*)$ %{DOCUMENT_ROOT}/index.php [L]

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^/?images/thumb/[0-9a-f]/[0-9a-f][0-9a-f]/([^/]+)/([0-9]+)px-.*$ %{DOCUMENT_ROOT}/thumb.php?f=$1&width=$2 [L,QSA,B]

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^/?images/thumb/archive/[0-9a-f]/[0-9a-f][0-9a-f]/([^/]+)/([0-9]+)px-.*$ %{DOCUMENT_ROOT}/thumb.php?f=$1&width=$2&archived=1 [L,QSA,B]

And in the localsettings.php file;

### For Friendly URLs, the following settings are used in conjunction with settings in the VirtualHost Directives in HTTPD.conf (obtained from https://shorturls.redwerks.org/)
### Additional notes from: https://www.mediawiki.org/wiki/Manual:Short_URL/Apache
### AND: An Alias Directive was also added (this was a key point in making it work, per recommendations from MediaWiki:
### Web Root Directory Note: https://www.mediawiki.org/wiki/Manual:Wiki_in_site_root_directory
###$wgScriptPath is defined earlier in the file at its default location
###$wgScriptPath = "";
$wgScriptExtension = ".php";
$wgArticlePath = "/$1";
$wgUsePathInfo = true;

###$wgEnableUploads is defined earlier in the file at its default location
###$wgEnableUploads  = true;
$wgGenerateThumbnailOnParse = false;

And finally a workaround I used to address the above mentioned issue about placing the wiki in a root directory of a website to put in an Apache configuration file (inside a VirtualHost Directive);

Alias			/wiki /var/www/html/Wiki.TerraBase.info

Just so you know, there are several ways MediaWiki displays content: .../index.php?WhatEverQuery or ...index.php/WhatEverPageTitle, explained here: https://www.mediawiki.org/wiki/Manual:Short_URL

A lot of the above for Apache is explained here: https://www.mediawiki.org/wiki/Manual:Short_URL/Apache The one thing they didn't include was addressing the question: What if the wiki is in the root directory of a website or virtual host? IE, the instructions they wrote are if one followed "best practices" by having the wiki in a sub-directory of a web site. My workaround using the Alias Directive is noted above.

WordPress

WordPress

Yoast and others


Other Thoughts

$wgWhitelistReadRegexp

...ran into an issue with the $wgWhitelistReadRegexp variable. The regular expression syntax written did not work as it should have. Below is a note posted on the MediaWiki site that is self explanatory;

Hyphens and Periods with MediaWiki 1.33 and PHP 7.3

There might be an issue with $wgWhitelistReadRegexp when attempting to allow hyphens or periods (and possibly other special characters).


For the below noted items, on a private MediaWiki site the following setting was configured: $wgGroupPermissions['*']['read'] = false;

As expected, anonymous users were successfully able to access articles in the Main NameSpace if $wgWhitelistReadRegexp was set to this: '/ :*/'

However, anonymous users were blocked from articles that contained no spaces and contained at least one hyphen in the title. Logged in users could view the article normally. Articles with spaces in the titles and a hyphen were visible for anonymous users. Example article title: DD-WRT

Adding this to the $wgWhitelistReadRegexp array did not correct the issue: '/ :.*\-.*/' (neither did '/ :*-/', '/ :*\-/', etc.)

That was puzzling, because the regular expression .*\-.* used with TitleBlacklist extension works as expected, so I anticipated the same regular expression would work with the $wgWhitelistReadRegexp array, but it did not, so I tried others, but nothing allowed anonymous users to view the article.

Periods caused a similar issue with article tiles that had spaces or no spaces. No other accepted title characters like ), (, ;, etc caused any issues.


Is there a different regular expression syntax used with $wgWhitelistReadRegexp or is this a bug?

Blacklist

To solve the issue with the $wgWhitelistReadRegexp variable, use the TitleBlacklist Extension (well, not really a solution, but a prevention);

###wfLoadExtension is defined earlier in the file at its default location
###wfLoadExtension( 'TitleBlacklist' );
### This extension is necessary to block creation of articles with a hyphen / dash ( - ) or a period in the title as the wgWhitelistReadRegexp setting can't handle periods (it can handle dashes)
$wgTitleBlacklistSources = array(
    array(
         'type' => 'localpage',
         'src'  => 'MediaWiki:Titleblacklist',
    ),
### The below items can be used to define other sources of a blacklist, but for my purposes a NameSpace is fine.
###    array(
###         'type' => 'url',
###         'src'  => 'https://meta.wikimedia.org/w/index.php?title=Title_blacklist&action=raw',
###    ),
###    array(
###         'type' => 'file',
###         'src'  => '/home/wikipedia/blacklists/titles',
###    ),
);
### The above noted MediaWiki:Titleblacklist has the following items added
### This is for the hyphen / dash character: .*\-.*
### This is for the period: .*\..*
### ALL Titleblacklist entries seem to start with .
### In the above examples, which start out with .*WhatEverWord.*, then for special characters, prepend it with a backslach ( \ ) and then the character
###
### This following permission allows the above settings to apply to the root / administrator too, otherwise that account is free to create any titled article.
$wgGroupPermissions['sysop']['tboverride'] = false;


<TOCslider/>