Featured Posts

$200 Adcenter Voucher I was browsing some books on marketing at Barnes & Noble today when I noticed one titled "Search Engine Advertising: Buying Your Way to the Top".  I picked it up and...

Readmore

WordPress robots.txt tips against duplicate content

Posted by Dan | Posted in Shoemoney | Posted on 03-03-2008

0

Been getting some questions about my robots.txt file and what certain things do.

Thankfully some regular expressions are supported in the robots.txt (but not many).

$ in regex means the end of the file. So if you do .php$ it your robots.txt that means it will match anything that ends in .php

This is really handy when you want to block all .exe .php or other files. For example:

Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.exe$

Specifically this is some of the things I use in my robots.txt

Disallow: /*? – this blocks all urls with a ? in them. A good way to avoid duplicate content issues with wordpress blogs. Obviously you only want to use this if you have changed your url structure to not be 100% ?=.

Disallow: /*.php$ – This blocks all .php files. Another good way to avoid duplicate content with a wordpress blog.

Disallow: /*.inc$ – you should not be showing .inc or include files to bots (google code search will eat you alive)

Disallow: /*.css$ – why would you show css files for indexing seems silly.. The wildcard is used here in case there are many css files.

Disallow: */feed/ feeds being indexed dilute your site equity. The wildcard * is used incase there is preceding chars.

Disallow: */trackback/ – no reason a trackback url should be indexed. The wildcard * is used incase there is preceding chars.

Disallow: /page/ – assloads of duplicate content in pages for wordpress.

Disallow: /tag/ – more douplicate content.

Disallow: /category/ – even more duplicate content.

SO what if you want to ALLOW a page. Like for instance my serps tool is serps.php and from the above rules that would not fly.

Allow: /serps.php – this does the trick!

Keep in mind I am not a SEO but I have picked up a few tricks along the way.

Source

Share and Enjoy:
  • Digg
  • del.icio.us
  • Netvouz
  • DZone
  • ThisNext
  • MisterWong
  • Mixx
  • Propeller
  • StumbleUpon

Post a comment