SEO (with tips for MediaWiki and WordPress): Difference between revisions

(8 intermediate revisions by the same user not shown)

Line 23:

*SEO (Search Engine Optimization)

*Tips for MediaWiki and WordPress related to SEO

*SiteMaps

*SiteMaps (and don't Image SiteMaps too)

*Robots (as in GoogleBot)

*Robots (as in GoogleBot) - robots.txt (and yes, Google has a wizard / generator for that)

*Google Search Console (https://en.wikipedia.org/wiki/Google_Search_Console)

*~~"Web master"~~ tools for other search sites (Bing, etc.)

*WebMaster tools for Google and other search sites (Bing, etc.)

*Other Google Services (Google Analytics, Google Insights (merged into Google Trends, [https://transparencyreport.google.com/safe-browsing/search?hl=en Transparency Report]))

*Other Google Services (Google Analytics, Website Optimizer, Google Insights (merged into Google Trends, [https://transparencyreport.google.com/safe-browsing/search?hl=en Transparency Report]))

*Custom Error Pages

*Special Consideration for Mobile Devices

*Other analysis tools like Sawmill

*Broken Links; Google says have a custom 404 error page, sure that's good, but what about fixing the broken links? (Plugin or Extension anyone?)

*Google states the following

**We do not use PRIORITY settings in SiteMap XML Files

**Date Modified is used (but not the way you think, IE, it seems as if they also compare content too and may punish people who abuse the modified or edited date)

==Basics==

Line 40:

Line 47:

*

*Additional "Tips of the Current Era": Look at extensions for MediaWiki, like [[mediawikiwiki:Extension:WikiSEO|WikiSEO]] (discussed later), and the different settings it has. These are some fairly clear indications as to what is important to SEO.

*Create a robots.txt file in the root of the website that includes a reference to a sitemap

*<syntaxhighlight lang="text">

Sitemap: https://Wiki.TerraBase.info/sitemap/sitemap-index-Wiki.TerraBase.info.xml

</syntaxhighlight>

*

===Robots===

For a simple site, there aren't many restrictions that need to be put in place from a technical perspective. To see what the "Big Boys" do with their robots.txt file, just go to their site and view the file (https://www.Microsoft.com/robots.txt).

Sample robots.txt file for a Wiki<syntaxhighlight lang="text">

User-agent: *

Disallow: / or /* (/ and /* are equivalent)

Allow: /wiki

User-agent: googlebot

Disallow: /

Allow: /wiki

Host: www.WhatEverSiteName.com OR WhatEverSiteName.com (supposedly not supported by Google)

Crawl-delay: 10 (Number of seconds to delay in some action, depends on bot)

Sitemap https://WhatEverSite/sitemap/sitemap.xml

</syntaxhighlight>For technical details on the robots.txt file: https://developers.google.com/search/reference/robots_txt

To test the robots.txt file, there are several ways including Google, but here's another: https://technicalseo.com/tools/robots-txt/

Indexing can also be controlled via HTML within the Header Tags;<syntaxhighlight lang="text">

</syntaxhighlight><br />

==MediaWiki==

Line 49:

Line 84:

Using the "[[mediawikiwiki:Help:Magic_words|Magic Word]]" PageName along with the $wgSiteName Variable (in LocalSettings.php), one can keep the automatically generated Title Name and add to it;<syntaxhighlight lang="text">

</syntaxhighlight>Tried ~~different~~ Meta Attributes (see [https://www.w3schools.com/tags/tag_meta.asp W3schools] for more) and none of them worked. Even tried it without "meta" prepending the actual attribute name. google-site-verification is another item that does work. WikiSEO is a more advanced extension that does a bunch more.

</syntaxhighlight>Tried additional Meta Attributes (see [https://www.w3schools.com/tags/tag_meta.asp W3schools] for more) and none of them worked. Even tried it without "meta" prepending the actual attribute name. google-site-verification is another item that does work. WikiSEO is a more advanced extension that does a bunch more. This one can add a description, which is useful as MediaWiki doesn't include a description by default, and Google (and others) tend to use the description over contents, so it gives one the option to customize a Description META Tag.

Not so much for SEO, but nice as it is displayed on every page is the "Tag Line" (Example on the Wikipedia Site:From Wikipedia, the free encyclopedia) is the "Special Page", MediaWiki:Tagline (by default contains: "From <nowiki>{{SiteName}}</nowiki>", SiteName being a "Magic Word" for the name of the website) can be edited to anything (although "Create Source" must be selected first before it can be edited).

Line 67:

Line 102:

</syntaxhighlight>This script is run to generate a new XML SiteMap (Google differentiates between a site map for human reading VS machine reading);<syntaxhighlight lang="text">

php /var/www/html/Wiki.TerraBase.info/maintenance/generateSitemap.php --server=https://wiki.terrabase.info --fspath=/var/www/html/Wiki.TerraBase.info/sitemap --urlpath=~~https://wiki.terrabase.info/~~sitemap --identifier=Wiki.TerraBase.info --compress=no --skip-redirects

php /var/www/html/Wiki.TerraBase.info/maintenance/generateSitemap.php --server=https://wiki.terrabase.info --fspath=/var/www/html/Wiki.TerraBase.info/sitemap --urlpath=sitemap --identifier=Wiki.TerraBase.info --compress=no --skip-redirects

</syntaxhighlight>It can be run manually or run by a Systemd Timer or a Cron job (see Other Thoughts for creating a System Timer on CentOS 7 or 8)

</syntaxhighlight>There's no way to name the file (sitemap.xml) as it auto generates the name (sitemap-WhatEverWebSiteName-NS_0-0.xml, followed by others, but "index" file is the important one)

It can be run manually start or run by a Systemd Timer or a Cron job (see Other Thoughts for creating a System Timer on CentOS 7 or 8)

AutoSitemap Extension;<syntaxhighlight lang="text">

In the Google Search Console a SiteMap can be deleted if selected, then using the ... at the upper right to delete. Also, several individuals have noted that the interface shows a "Couldn't fetch" error message, but usually means "pending" (just click around and come back and it should show success).

AutoSitemap Extension;

For a SystemD Timer: systemctl list-timers --all to find the timer<syntaxhighlight lang="text">

### This extension causes an error message with VisualEditor, but the page saves and the SiteMap File is created

### A strategy might be to enable manually when a new SiteMap File is needed.

Line 127:

Line 168:

Yoast and others

Google XML Sitemaps Plugin

Udinra All Image Sitemap Plugin

WPMUDEV Plugins (various)

Google Sitemap XML does get along with Yoast. Sort of. Instead of using the sitemap_index.xml, it uses the sitemap.xml file name

WP Sitemap Page generates a human readable sitemap if one puts shortcode on each page to be "sitemaped". Note, this is not for search engines.

Other SiteMap plugins focus on very specific search engines, like google news. (XML Sitemap and Google News)

Yoast doesn't like to "share" the sitemap_index.xml URL file name, but apache rewrite could change it, then the individual yoast generated sitemaps could be used with a static sitemap_index.xml OR, better yet, just notify Google and define in the Robots.txt file a different index name. But if the latter choice is taken, then the automatic notification (AKA ping) of google by Yoast would need to be modified. So why not keep Yoast "stock" and modify at the Apache level? And with a quick bit of experimenting, just discovered that a physical file in the root index will override anything generated by WordPress. Copy the MediaWikiFormat or follow these instructions: https://www.google.com/sitemaps/protocol.html

Taxonomy: In wordpress taxonomy is a blanket term that includes Categories and Tags (a "sub category", equivalent to Keywords). These items are used in internal wordpress searches.

<br />

==Other Google Services==

Line 191:

Line 250:

</syntaxhighlight>

=== Systemd Timer for CentOS 7 & 8 and others ===

===Systemd Timer for CentOS 7 & 8 and others===

Based on information from: https://www.certdepot.net/rhel7-use-systemd-timers/

<br />

Create a file that ends in .service in the /usr/lib/systemd/system/ directory similar to this (change paths as appropriate);<syntaxhighlight lang="text">

[Unit]

Description=Create XML SiteMap for WhatEverURL

[Service]

Type=simple

ExecStart=/usr/bin/php /var/www/html/WhatEverURL/maintenance/generateSitemap.php --server=https://WhatEverURL --fspath=/var/www/html/WhatEverURL /sitemap --urlpath=https://WhatEverURL /sitemap --identifier=WhatEverURL --compress=no --skip-redirects

[Install]

WantedBy=multi-user.target

</syntaxhighlight>Create a file that ends in .timer in the same directory;<syntaxhighlight lang="text">

[Unit]

Description=Timer Service that runs the sitemap.WhatEverURL.service

[Timer]

OnCalendar=*-*-* 00/12:00:00

Unit=sitemap.WhatEverURL.service

[Install]

WantedBy=multi-user.target

</syntaxhighlight>Then enable the time with this command: systemctl enable WhatEverName.timer

And start it with this command: systemctl start WhatEverName.timer

Check its status with: systemctl is-enabled OR is-active WhatEverName.timer

The .service can be checked with this command: systemctl start WhatEverName.service

===Handy Command to Find Information about Services===

whereis WhatEverProgramOrService

It doesn't tell everything about a service. Case in point, BIND or NAMED, because it doesn't mention the /var/spool/ directory for zone files.<br />

@@ Line 23: / Line 23: @@
 *SEO (Search Engine Optimization)
 *Tips for MediaWiki and WordPress related to SEO
-*SiteMaps
+*SiteMaps (and don't Image SiteMaps too)
-*Robots (as in GoogleBot)
+*Robots (as in GoogleBot) - robots.txt (and yes, Google has a wizard / generator for that)
 *Google Search Console (https://en.wikipedia.org/wiki/Google_Search_Console)
-*"Web master" tools for other search sites (Bing, etc.)
+*WebMaster tools for Google and other search sites (Bing, etc.)
-*Other Google Services (Google Analytics, Google Insights (merged into Google Trends, [https://transparencyreport.google.com/safe-browsing/search?hl=en Transparency Report]))
+*Other Google Services (Google Analytics, Website Optimizer, Google Insights (merged into Google Trends, [https://transparencyreport.google.com/safe-browsing/search?hl=en Transparency Report]))
+*Custom Error Pages
+*Special Consideration for Mobile Devices
+*Other analysis tools like Sawmill
+*Broken Links; Google says have a custom 404 error page, sure that's good, but what about fixing the broken links? (Plugin or Extension anyone?)
+*Google states the following
+**We do not use PRIORITY settings in SiteMap XML Files
+**Date Modified is used (but not the way you think, IE, it seems as if they also compare content too and may punish people who abuse the modified or edited date)
 ==Basics==
@@ Line 40: / Line 47: @@
 *
 *Additional "Tips of the Current Era": Look at extensions for MediaWiki, like [[mediawikiwiki:Extension:WikiSEO|WikiSEO]] (discussed later), and the different settings it has.  These are some fairly clear indications as to what is important to SEO.
+*Create a robots.txt file in the root of the website that includes a reference to a sitemap
+*<syntaxhighlight lang="text">
+Sitemap: https://Wiki.TerraBase.info/sitemap/sitemap-index-Wiki.TerraBase.info.xml
+</syntaxhighlight>
 *
+===Robots===
+For a simple site, there aren't many restrictions that need to be put in place from a technical perspective.  To see what the "Big Boys" do with their robots.txt file, just go to their site and view the file (https://www.Microsoft.com/robots.txt).
+Sample robots.txt file for a Wiki<syntaxhighlight lang="text">
+User-agent: *
+Disallow: / or /* (/ and /* are equivalent)
+Allow: /wiki
+User-agent: googlebot
+Disallow: /
+Allow: /wiki
+Host: www.WhatEverSiteName.com OR WhatEverSiteName.com (supposedly not supported by Google)
+Crawl-delay: 10 (Number of seconds to delay in some action, depends on bot)
+Sitemap https://WhatEverSite/sitemap/sitemap.xml
+</syntaxhighlight>For technical details on the robots.txt file: https://developers.google.com/search/reference/robots_txt
+To test the robots.txt file, there are several ways including Google, but here's another: https://technicalseo.com/tools/robots-txt/
+Indexing can also be controlled via HTML within the Header Tags;<syntaxhighlight lang="text">
+<meta name="robots" content="noindex" />
+<meta name="googlebot" content="noarchive,nosnippet" />
+</syntaxhighlight><br />
 ==MediaWiki==
@@ Line 49: / Line 84: @@
 Using the "[[mediawikiwiki:Help:Magic_words|Magic Word]]" PageName along with the $wgSiteName Variable (in LocalSettings.php), one can keep the automatically generated Title Name and add to it;<syntaxhighlight lang="text">
 <seo title="{{PAGENAME}} - WhatEverAddtionalText" metakeywords="WhatEverKeyWords,AnotherKeyWord,Etc." metadescription="WhatEverDescription,AnotherDescription,Etc." />
-</syntaxhighlight>Tried different Meta Attributes (see [https://www.w3schools.com/tags/tag_meta.asp W3schools] for more) and none of them worked.  Even tried it without "meta" prepending the actual attribute name.   google-site-verification is another item that does work.  WikiSEO is a more advanced extension that does a bunch more.
+</syntaxhighlight>Tried additional Meta Attributes (see [https://www.w3schools.com/tags/tag_meta.asp W3schools] for more) and none of them worked.  Even tried it without "meta" prepending the actual attribute name.   google-site-verification is another item that does work.  WikiSEO is a more advanced extension that does a bunch more.  This one can add a description, which is useful as MediaWiki doesn't include a description by default, and Google (and others) tend to use the description over contents, so it gives one the option to customize a Description META Tag.
 Not so much for SEO, but nice as it is displayed on every page is the "Tag Line" (Example on the Wikipedia Site:From Wikipedia, the free encyclopedia) is the "Special Page", MediaWiki:Tagline (by default contains: "From <nowiki>{{SiteName}}</nowiki>", SiteName being a "Magic Word" for the name of the website) can be edited to anything (although "Create Source" must be selected first before it can be edited).
@@ Line 67: / Line 102: @@
 </syntaxhighlight>This script is run to generate a new XML SiteMap (Google differentiates between a site map for human reading VS machine reading);<syntaxhighlight lang="text">
-php /var/www/html/Wiki.TerraBase.info/maintenance/generateSitemap.php --server=https://wiki.terrabase.info --fspath=/var/www/html/Wiki.TerraBase.info/sitemap --urlpath=https://wiki.terrabase.info/sitemap --identifier=Wiki.TerraBase.info --compress=no --skip-redirects
+php /var/www/html/Wiki.TerraBase.info/maintenance/generateSitemap.php --server=https://wiki.terrabase.info --fspath=/var/www/html/Wiki.TerraBase.info/sitemap --urlpath=sitemap --identifier=Wiki.TerraBase.info --compress=no --skip-redirects
-</syntaxhighlight>It can be run manually or run by a Systemd Timer or a Cron job (see Other Thoughts for creating a System Timer on CentOS 7 or 8)
+</syntaxhighlight>There's no way to name the file (sitemap.xml) as it auto generates the name (sitemap-WhatEverWebSiteName-NS_0-0.xml, followed by others, but "index" file is the important one)
+It can be run manually start  or run by a Systemd Timer or a Cron job (see Other Thoughts for creating a System Timer on CentOS 7 or 8)
-AutoSitemap Extension;<syntaxhighlight lang="text">
+In the Google Search Console a SiteMap can be deleted if selected, then using the ... at the upper right to delete.  Also, several individuals have noted that the interface shows a "Couldn't fetch" error message, but usually means "pending" (just click around and come back and it should show success).
+AutoSitemap Extension;
+For a SystemD Timer: systemctl list-timers --all to find the timer<syntaxhighlight lang="text">
 ### This extension causes an error message with VisualEditor, but the page saves and the SiteMap File is created
 ### A strategy might be to enable manually when a new SiteMap File is needed.
@@ Line 127: / Line 168: @@
 Yoast and others
+Google XML Sitemaps Plugin
+Udinra All Image Sitemap Plugin
+WPMUDEV Plugins (various)
+Google Sitemap XML does get along with Yoast.  Sort of.  Instead of using the sitemap_index.xml, it uses the sitemap.xml file name
+WP Sitemap Page generates a human readable sitemap if one puts shortcode on each page to be "sitemaped".  Note, this is not for search engines.
+Other SiteMap plugins focus on very specific search engines, like google news. (XML Sitemap and Google News)
+Yoast doesn't like to "share" the sitemap_index.xml URL file name, but apache rewrite could change it, then the individual yoast generated sitemaps could be used with a static sitemap_index.xml OR, better yet, just notify Google and define in the Robots.txt file a different index name. But if the latter choice is taken, then the automatic notification (AKA ping) of google by Yoast would need to be modified.  So why not keep Yoast "stock" and modify at the Apache level?  And with a quick bit of experimenting, just discovered that a physical file in the root index will override anything generated by WordPress.  Copy the MediaWikiFormat or follow these instructions: https://www.google.com/sitemaps/protocol.html
+Taxonomy: In wordpress taxonomy is a blanket term that includes Categories and Tags (a "sub category", equivalent to Keywords).  These items are used in internal wordpress searches.
+<br />
 ==Other Google Services==
@@ Line 191: / Line 250: @@
 </syntaxhighlight>
-=== Systemd Timer for CentOS 7 & 8 and others ===
+===Systemd Timer for CentOS 7 & 8 and others===
 Based on information from: https://www.certdepot.net/rhel7-use-systemd-timers/
-<br />
+Create a file that ends in .service in the /usr/lib/systemd/system/ directory similar to this (change paths as appropriate);<syntaxhighlight lang="text">
+[Unit]
+Description=Create XML SiteMap for WhatEverURL
+[Service]
+Type=simple
+ExecStart=/usr/bin/php /var/www/html/WhatEverURL/maintenance/generateSitemap.php --server=https://WhatEverURL --fspath=/var/www/html/WhatEverURL /sitemap --urlpath=https://WhatEverURL /sitemap --identifier=WhatEverURL  --compress=no --skip-redirects
+[Install]
+WantedBy=multi-user.target
+</syntaxhighlight>Create a file that ends in .timer in the same directory;<syntaxhighlight lang="text">
+[Unit]
+Description=Timer Service that runs the sitemap.WhatEverURL.service
+[Timer]
+OnCalendar=*-*-* 00/12:00:00
+Unit=sitemap.WhatEverURL.service
+[Install]
+WantedBy=multi-user.target
+</syntaxhighlight>Then enable the time with this command: systemctl enable WhatEverName.timer
+And start it with this command: systemctl start WhatEverName.timer
+Check its status with: systemctl is-enabled OR is-active WhatEverName.timer
+The .service can be checked with this command: systemctl start WhatEverName.service
+===Handy Command to Find Information about Services===
+whereis WhatEverProgramOrService
+It doesn't tell everything about a service.  Case in point, BIND or NAMED, because it doesn't mention the /var/spool/ directory for zone files.<br />