Harmony Hollow Software Special web hosting offer - LIMITED TIME ONLY


How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag


Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)
This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: *
Disallow:

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: *
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario
If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag
This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk - The search marketing company

Add to these social bookmarking sites:

Add to: Mr. Wong Add to: Webnews Add to: Icio Add to: Oneview Add to: Folkd Add to: Yigg Add to: Linkarena Add to: Digg Add to: Del.icio.us Add to: Reddit Add to: Simpy Add to: StumbleUpon Add to: Slashdot Add to: Netscape Add to: Furl Add to: Yahoo Add to: Spurl Add to: Google Add to: Blinklist Add to: Blogmarks Add to: Diigo Add to: Technorati Add to: Newsvine Add to: Blinkbits Add to: Ma.Gnolia Add to: Smarking Add to: Netvouz Information

MORE RESOURCES:

Selling SEO During an Economic Downturn
Search Engine Watch - 14 hours ago
By Chris Boggs, Search Engine Watch, Nov 21, 2008 In a down economy heading into 2009, business development for SEO becomes increasingly important. ...
SEO Q&A: Which Is More Likely to Be Clicked? Internet Search Engine Database
all 2 news articles


ABC News

Strategies: Use search engines to rev your business
USA Today - 12 hours ago
Search Engine Optimization (SEO): Steps taken to get your site to rank highly in search results without paying search engine companies directly for ...
Google: Algorithms Aren’t the Only Answer GigaOm
SearchWiki - The end of PageRank? Blogstorm
Search engine optimisation: Google launches SearchWiki Internet Marketing News
HostingTech.com - ADOTAS
all 281 news articles


SEO Watchdog Takes the Guesswork Out of Finding a Good SEO Provider
MarketWatch - Nov 19, 2008
Companies specializing in SEO services have boomed in the last decade; however, not all of them have got what it takes to send a site flying up the ladder ...


Seo Wins Silver at Cycling World Cup
코리아타임즈, South Korea - 8 hours ago
By Bryan Kay Seo Joon-yong claimed a dramatic silver for South Korea at the cycling World Cup in Melbourne, Australia, Thursday night. Seo finished the 30km ...


SEO Brand Media Takes Interaction to a New Level Lead Management ...
Promotion World (press release), CA - 11 hours ago
Manila Philippines, November 2008-SEO Brand Media made a breakthrough when it introduced the Lead Management System for Sta. Elena Golf Club and Village. ...


Search Engine Optimization Firm ArteWorks SEO Ranked #1 SEO Site ...
MarketWatch - Nov 18, 2008
The SEO company rankings published by TopSEOs is the trusted guide used by those seeking the best in the internet marketing and search engine optimization ...


Consistent websites 'good for SEO'
Digital Response Media, UK - 8 hours ago
Companies must ensure that their sites follow a consistent structure in order to make the most of search engine optimisation (SEO), according to one expert. ...
Regaining lost links in search engine optimisation Business Feet
all 2 news articles


Direct Traffic Media

Companies may benefit from new SEO service
Direct Traffic Media, UK - 8 hours ago
Those firms looking to step up their internet advertising methods may be set to benefit from a new search engine optimisation (SEO) adviser that has been ...


Bill Casey Opens Full-Service Web Site Design, Development and ...
PR Web (press release), WA - 11 hours ago
New Atlanta web site design and SEO firm gives small and medium sized businesses a simple and effective path to creative, profitable web sites. ...


코리아타임즈

Seo First Korean to Score 10000 KBL Career Points
코리아타임즈, South Korea - Nov 20, 2008
By Yoon Chul Powerful Korean center Seo Jang-hoon of the KCC Egis became the first player to score 10000 points in Korean Basketball League (KBL) history at ...

SEO - Google News

Home | Site Map
© 2008