
DIY SEO: Canonical URL’s, htaccess and Joomla
Posted on 12.08.2009
OK, I had to scour the internet to find bits and pieces to this question. What lines do you need to add to your .htaccess file to get canonical URL’s to resolve properly for a Joomla 1.5 site? Well after an exhaustive search of numerous blogs and chat forums, I managed to add 6 “rewrites” to my .htaccess file that have solved the problem for my client’s site without causing any issues in the administrator back-end or otherwise (knock on wood, I’m still testing it out). Before I get to the additions, I should probably explain the whole canonical issue for anyone that is just learning. Basically, Google counts the following URL’s as different pages even though they are essentially the same page as far as you and your site are concerned:
- https://www.yardstickservices.com
- https://www.yardstickservices.com/
- https://www.yardstickservices.com/index.php
- https://yardstickservices.com
- https://yardstickservices.com/
- https://yardstickservices.com/index.php
And this applies to all the pages of your site which is why this is so important. If you want to get maximum pagerank, you need to get this sorted. OK, so here’s the code that I added to my root .htaccess file with an explanation to follow:
RewriteEngine On
RewriteBase /
# prevents people from accessing anything with phpMyAdmin
RewriteRule ^/phpMyAdmin.*$ https://www.yardstickservices.com
# force www
RewriteCond %{HTTP_HOST} ^yardstickservices\.com$ [NC]
RewriteRule ^(.*)$ https://www.yardstickservices.com/$1 [R=301,L]
# remove index.php within the URL
RedirectMatch permanent index.php/(.*) https://www.yardstickservices.com/$1
# remove index.php at the end of the URL and change to /
RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.(php|html)\ HTTP
RewriteRule (.*)index\.(php|html)$ /$1 [R=301,L]
# Remove index.php from root URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html) [NC]
RewriteRule ^index\.php$ https://www.yardstickservices.com/ [R=301,L]
# Add a trailing slash to all URL's
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
For everyone's reference, I've pasted in suggestions made by g1smd (below) as his explanation trumps mine.
The six rules reduce to only four, and there are opportunities for other optimisations:
# prevents people from accessing anything with phpMyAdmin
# (pick one of two, second one is preferred)
RewriteRule phpMyAdmin http://www.example.com/ [R=301,L]
RewriteRule phpMyAdmin - [F]
# Add a trailing slash to all root *extensionless* URLs (but I would advise to NOT do that)
RewriteCond %{REQUEST_URI} /[^/.]+$
RewriteRule ^([^/.]+)$ http://www.example.com/$1/ [R=301,L]
# Remove index.php or index.htm/html from URL requests
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(php|html?)\ HTTP/
RewriteRule ^([^/]+/)*index\.(html?|php)$ http://www.example.com/$1 [R=301,L]
# force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
- RewriteBase / is only needed if you site is on cloud hosting like Mosso/Rackspace so if you don’t know, ask your hosting provider.
- The second condition prevents any access from hackers trying to access any URL containing phpMyAdmin. I redirected this to my homepage but you may want to redirect it to some other page.
- The rest are explained by the commented out line beginning with the #
Now, this may not work for all sites depending on the extensions you have installed and the URL’s that they output so take this with a grain of salt and make sure you backup your .htaccess file (and for that matter, your entire site) before you start messing around because a mistake here can really bung things up. Otherwise, good luck and here’s to a few more notches on the old pagerank-o-meter.
Kevin McLeod
A little update for anyone that uses extensions like chronoforms. You will want to comment out a few lines as follows:
# remove index.php within the URL# RedirectMatch permanent index.php/(.*) http://www.yardstick.com
# Remove index.php from root URL# RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html) [NC]# RewriteRule ^index\.php$ http://www.yardstick.com [R=301,L]
Henrik B
Thank you for the explanations, I needed the exclusion of index.php for some sites, have been looking quite a while for this solution.
Henrik B
# remove index.php at the end of the URL and change to /
RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.(php|html)\ HTTP
RewriteRule (.*)index\.(php|html)$ /$1 [R=301,L]
This stops the search function and login option from working correctly, it redirects to / because of the index.php call in the command when trying to login, search etc. do you have any solution for this?
Kevin McLeod
I’m not surprised and we actually have to comment out one or more of the above lines depending on the extensions that we have installed in various sites (it’s not a perfect solution to be sure). Further, I only use the administrator login for my client’s sites and very few (if any) of the sites we build have a search function as they are mostly small business websites.My only suggestion would be to look into one of two SEF URL extensions that I’ve used in the past that might be able to help you resolve the conflict:
sh404SEF – http://extensions.joomla.org/extensions/site-management/sef/2380
ARTIO JoomSEF – http://extensions.joomla.org/extensions/site-management/sef/1063
Kevin McLeod
I just came across a great canonical URL extension that works great and is much easier than manually editing the .htaccess file.
http://extensions.joomla.org/extensions/site-management/seo-a-metadata/5355
Jeremy
Thanks, this was really helpful!
Joomla Developer
The article is good provides useful information about Joomla 1.5 site.I like the article very much as it is very informative and hope to see more of such articles.
Sabreen Rezvie (Sam)
Dude thanks alot for this tutorial! my site was all messed up with /index.php and with out that. Now all changed for good.
I was looking for some thing like this all over the net.
thanks again.
Kevin McLeod
No problem. Just make sure you test that all of your extensions are working properly because overly aggressive use of these rewrites has been known to cause extensions like Virtuemart to stop working.
Sabreen Rezvie (Sam)
Yeah, front end login didnt work on the joomla blog. I removed it since Im the only one who is using it. So far so good with the other extensions.
Sabreen Rezvie (Sam)
is there a way to have the forms and redirect index.php?
Kevin McLeod
There’s a couple Joomla extensions that might work for you as well. We use AceSEF quite a lot and sh404sef isn’t bad either.
Sabreen Rezvie (Sam)
I see, I found a different form in extensions which isnt redirecting to index.php
still testing I’ll let you know if it goes well.
and thanx I’ll check those 2 extensions as well.
Sabreen Rezvie (Sam)
I am using FlexiContact atm. It works with those redirection.
Rene Beaulieu
Hello Kevin!
Could you tell me how i could get my site http://www.securaglobe.com and http://securaglobe.com running on Joomla to permanently redirect to just the http://www.securaglobe.com.
I am looking for the bit of code that I can put in my htaccess file. I am not a computer wiz at all nor do i know how to code. I have tested several codes from the web and they do not appear to be working.
Kevin McLeod
Hi Rene,
I would suggest you take a look at AceSEF or sh404SEF (which can be found in the Joomla extensions directory). These extensions are easier to use than trying to manually manipulate your .htaccess file with the lines I have listed above. And each extension has a pretty big following means you can get help from one of their chat forums.
salario
Kevin, your example above literally solved all my problems. Thanks for posting this….I searched all day and finally a solution that actually works to remove index.php and I love the leading slash, nice touch.
Thanks!
Salar
Pixelboxdesign
Excellent Article, Added to favs, keep up the good work.
Barton
Q) Should the sitemap reference the original .php or rewritten .htm web pages?
Q) Should the HTML code above reference .php or .htm?
Kevin McLeod
Hey Barton,
With regards to sitemaps, we use an extension called xmaps which automatically generates sitemaps in xml format. That, we find, is the easiest format for submitting and getting indexed via Google Webmaster Tools.
Michel
Thank you very much for your post. I was looking for how to do that without messing up my access to the joomla back-end. This works like a charm
Kevin McLeod
Hey OC SEO,
I think this is the line you want:
# remove index.php within the URL
RedirectMatch permanent index.php/(.*)
With that being said, I’m not as Savvy with WordPress as I am with Joomla. We here at Yardstick have been using JoomAce’s AceSEF with much success to setup and manage a lot of redirects and canonicalization that we used to handle with “rewrites”. And Joomla 1.6 has some built-in redirect functionality as well so stay tuned for that.
g1smd
If you use RewriteRule for any of your rules, use it for ALL of your rules. Never mix Redirect/RedirectMatch with RewriteRule in the same site. It can cause redirects to be processed in a different order to what you expected.
Kevin McLeod
Hats off to g1smd for this. I admit, I gathered these from other sources so it’s good to see someone with some real expertise clarify the finer points. Thanks mate!
Monte Nemi
Hi,
I applied your suggestion as regards removing index.php from URL requests. Afterwards I couldn’t get into the site administration in the backend. Is there a way to resolve this?
Thanks in advance.
brendonjmcleod
I would suggest using an FTP client such as Filezilla to access the root directory of your website. There you will be able to edit the .htaccess file. And always be sure to backup your site, just in case.