![]() |
|
| Webmaster Resources: Search Engine Optimization Information | |
|
|
How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag
Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified. Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines. There are two ways to control how the search engine spiders index your site. 1. The Robot Exclusion File or "robots.txt" and 2. The Robots < Meta > Tag The Robots Exclusion File (Robots.txt) The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag. Creating your robots.txt file Example 1 Scenario User-agent: * Explanation Example 2 Scenario User-agent: * Explanation Example 3 Scenario User-agent: googlebot Explanation By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google. That's all there is to it! As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag. The Robots Tag In this example we are telling all search engines not to index the page or to follow any of the links contained within the page. In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive. What could be simpler! Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination. Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk - The search marketing company
Add to these social bookmarking sites: MORE RESOURCES: Warning: fopen(http://news.google.com/news?sourceid=navclient&ie=UTF-8&rls=GGLG,GGLG:2005-22,GGLG:en&q=SEO&output=rss) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 503 Service Unavailable in /hermes/bosweb/web239/b2397/glo.hhsoft/webmaster-resources/seo/inc/rss.inc on line 81 could not open XML input |