Skip to main content
DIY SEO: Canonical URL’s, htaccess and Joomla

DIY SEO: Canonical URL’s, htaccess and Joomla

Posted on 12.08.2009

OK, I had to scour the internet to find bits and pieces to this question. What lines do you need to add to your .htaccess file to get canonical URL’s to resolve properly for a Joomla 1.5 site? Well after an exhaustive search of numerous blogs and chat forums, I managed to add 6 “rewrites” to my .htaccess file that have solved the problem for my client’s site without causing any issues in the administrator back-end or otherwise (knock on wood, I’m still testing it out). Before I get to the additions, I should probably explain the whole canonical issue for anyone that is just learning. Basically, Google counts the following URL’s as different pages even though they are essentially the same page as far as you and your site are concerned:

  • https://www.yardstickservices.com
  • https://www.yardstickservices.com/
  • https://www.yardstickservices.com/index.php
  • https://yardstickservices.com
  • https://yardstickservices.com/
  • https://yardstickservices.com/index.php

And this applies to all the pages of your site which is why this is so important. If you want to get maximum pagerank, you need to get this sorted. OK, so here’s the code that I added to my root .htaccess file with an explanation to follow:

RewriteEngine On
RewriteBase /
# prevents people from accessing anything with phpMyAdmin
RewriteRule ^/phpMyAdmin.*$ https://www.yardstickservices.com
# force www
RewriteCond %{HTTP_HOST} ^yardstickservices\.com$ [NC]
RewriteRule ^(.*)$ https://www.yardstickservices.com/$1 [R=301,L]
# remove index.php within the URL
RedirectMatch permanent index.php/(.*) https://www.yardstickservices.com/$1
# remove index.php at the end of the URL and change to /
RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.(php|html)\ HTTP
RewriteRule (.*)index\.(php|html)$ /$1 [R=301,L]
# Remove index.php from root URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html) [NC]
RewriteRule ^index\.php$ https://www.yardstickservices.com/ [R=301,L]
# Add a trailing slash to all URL's
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
For everyone's reference, I've pasted in suggestions made by g1smd (below) as his explanation trumps mine.
The six rules reduce to only four, and there are opportunities for other optimisations:
# prevents people from accessing anything with phpMyAdmin
# (pick one of two, second one is preferred)
RewriteRule phpMyAdmin http://www.example.com/ [R=301,L]
RewriteRule phpMyAdmin - [F]
# Add a trailing slash to all root *extensionless* URLs (but I would advise to NOT do that)
RewriteCond %{REQUEST_URI} /[^/.]+$
RewriteRule ^([^/.]+)$ http://www.example.com/$1/ [R=301,L]
# Remove index.php or index.htm/html from URL requests
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(php|html?)\ HTTP/
RewriteRule ^([^/]+/)*index\.(html?|php)$ http://www.example.com/$1 [R=301,L]
# force canonical www if request is for non-www or has port number etc
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
  1. RewriteBase / is only needed if you site is on cloud hosting like Mosso/Rackspace so if you don’t know, ask your hosting provider.
  2. The second condition prevents any access from hackers trying to access any URL containing phpMyAdmin. I redirected this to my homepage but you may want to redirect it to some other page.
  3. The rest are explained by the commented out line beginning with the #

Now, this may not work for all sites depending on the extensions you have installed and the URL’s that they output so take this with a grain of salt and make sure you backup your .htaccess file (and for that matter, your entire site) before you start messing around because a mistake here can really bung things up. Otherwise, good luck and here’s to a few more notches on the old pagerank-o-meter.


Comments (26)

  • A little update for anyone that uses extensions like chronoforms. You will want to comment out a few lines as follows:
    # remove index.php within the URL# RedirectMatch permanent index.php/(.*) http://www.yardstick.com
    # Remove index.php from root URL# RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php|html) [NC]# RewriteRule ^index\.php$ http://www.yardstick.com [R=301,L]

  • Thank you for the explanations, I needed the exclusion of index.php for some sites, have been looking quite a while for this solution.

  • # remove index.php at the end of the URL and change to /
    RewriteCond %{THE_REQUEST} ^GET\ /.*/index\.(php|html)\ HTTP
    RewriteRule (.*)index\.(php|html)$ /$1 [R=301,L]
    This stops the search function and login option from working correctly, it redirects to / because of the index.php call in the command when trying to login, search etc. do you have any solution for this?

  • I’m not surprised and we actually have to comment out one or more of the above lines depending on the extensions that we have installed in various sites (it’s not a perfect solution to be sure). Further, I only use the administrator login for my client’s sites and very few (if any) of the sites we build have a search function as they are mostly small business websites.My only suggestion would be to look into one of two SEF URL extensions that I’ve used in the past that might be able to help you resolve the conflict:
    sh404SEF – http://extensions.joomla.org/extensions/site-management/sef/2380
    ARTIO JoomSEF – http://extensions.joomla.org/extensions/site-management/sef/1063

  • The article is good provides useful information about Joomla 1.5 site.I like the article very much as it is very informative and hope to see more of such articles.

  • Sabreen Rezvie (Sam)

    Dude thanks alot for this tutorial! my site was all messed up with /index.php and with out that. Now all changed for good.
    I was looking for some thing like this all over the net.
    thanks again.

    • No problem. Just make sure you test that all of your extensions are working properly because overly aggressive use of these rewrites has been known to cause extensions like Virtuemart to stop working.

      • Sabreen Rezvie (Sam)

        Yeah, front end login didnt work on the joomla blog. I removed it since Im the only one who is using it. So far so good with the other extensions.

    • There’s a couple Joomla extensions that might work for you as well. We use AceSEF quite a lot and sh404sef isn’t bad either.

      • Sabreen Rezvie (Sam)

        I see, I found a different form in extensions which isnt redirecting to index.php
        still testing I’ll let you know if it goes well.
        and thanx I’ll check those 2 extensions as well.

    • Hi Rene,
      I would suggest you take a look at AceSEF or sh404SEF (which can be found in the Joomla extensions directory). These extensions are easier to use than trying to manually manipulate your .htaccess file with the lines I have listed above. And each extension has a pretty big following means you can get help from one of their chat forums.

  • Kevin, your example above literally solved all my problems. Thanks for posting this….I searched all day and finally a solution that actually works to remove index.php and I love the leading slash, nice touch.
    Thanks!
    Salar

  • Q) Should the sitemap reference the original .php or rewritten .htm web pages?
    Q) Should the HTML code above reference .php or .htm?

    • Hey Barton,
      With regards to sitemaps, we use an extension called xmaps which automatically generates sitemaps in xml format. That, we find, is the easiest format for submitting and getting indexed via Google Webmaster Tools.

  • Thank you very much for your post. I was looking for how to do that without messing up my access to the joomla back-end. This works like a charm

  • Hey OC SEO,
    I think this is the line you want:
    # remove index.php within the URL
    RedirectMatch permanent index.php/(.*)
    With that being said, I’m not as Savvy with WordPress as I am with Joomla. We here at Yardstick have been using JoomAce’s AceSEF with much success to setup and manage a lot of redirects and canonicalization that we used to handle with “rewrites”. And Joomla 1.6 has some built-in redirect functionality as well so stay tuned for that.

  • If you use RewriteRule for any of your rules, use it for ALL of your rules. Never mix Redirect/RedirectMatch with RewriteRule in the same site. It can cause redirects to be processed in a different order to what you expected.

    • Hats off to g1smd for this. I admit, I gathered these from other sources so it’s good to see someone with some real expertise clarify the finer points. Thanks mate!

  • Hi,
    I applied your suggestion as regards removing index.php from URL requests. Afterwards I couldn’t get into the site administration in the backend. Is there a way to resolve this?
    Thanks in advance.

    • I would suggest using an FTP client such as Filezilla to access the root directory of your website. There you will be able to edit the .htaccess file. And always be sure to backup your site, just in case.

Comments are closed.