Harmony Hollow Software

How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag


Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)
This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: *
Disallow:

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: *
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario
If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag
This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk - The search marketing company

Add to these social bookmarking sites:

Add to: Mr. Wong Add to: Webnews Add to: Icio Add to: Oneview Add to: Folkd Add to: Yigg Add to: Linkarena Add to: Digg Add to: Del.icio.us Add to: Reddit Add to: Simpy Add to: StumbleUpon Add to: Slashdot Add to: Netscape Add to: Furl Add to: Yahoo Add to: Spurl Add to: Google Add to: Blinklist Add to: Blogmarks Add to: Diigo Add to: Technorati Add to: Newsvine Add to: Blinkbits Add to: Ma.Gnolia Add to: Smarking Add to: Netvouz Information

MORE RESOURCES:

Homepage Sliders: Bad For SEO, Bad For Usability
Search Engine Land
One of the most prevalent design flaws in B2B websites is the use of carousels (or sliders) on the homepage. Carousels are an ineffective way to target user personas, which ends up hurting the site's SEO and usability. In fact, at the recent Conversion ...



Search Engine Roundtable

SEO Consultant's Parents Senselessly Murdered
Search Engine Roundtable
Renae and Lola Cottam One thing about the SEO industry is that for the most part, we are a tight community and we do care about each other. When I hear sad news about a fellow SEO, I feel it in the pit of my stomach. So sharing this type news is ...



Google Makes Non-Desktop SEO an Absolute Necessity
Search Engine Watch
I have seen many agencies espousing the benefits of having a mobile SEO strategy based on creating differentiated, purely mobile experiences but it would appear to me that as device fragmentation continues, it would be better to help a brand achieve ...

and more »


Work Smartly by Using SEO Diversity & EMD Recovery Tips
Search Engine Journal
Using Synonyms: Before LSI, using synonyms is one of the key tactics of SEO but now Google is considering the synonyms as the same words and playing with synonyms to diversify the keywords will not going to be easy. However, it definitely helps you if ...



Data Analysis for SEO: Industry KPIs For A Mature Industry
Search Engine Land
It was in the 1990s that Archie — generally considered to be the first Internet search engine — was written. But it wasn't until WebCrawler and Lycos in 1994 that publicly accessible full-text search became generally available and spawned the demand ...



A Seven-fold Return: Investing in Early Childhood Education
Huffington Post
President Obama's State of the Union address this past January highlighted the value of investing in early childhood education. He noted that "Every dollar we invest in high-quality early education can save more than 7 dollars later on." Putting words ...



PPC and SEO: Higher Conversion Rates Fuel the Need for Better Integration
Search Engine Watch
PPC and SEO: Higher Conversion Rates Fuel the Need for Better Integration. by Jason Tabeling, June 18, 2013 Comments. Many paid search marketers often go about their business of optimizing their accounts without really caring about what else might be ...



Search Engine Journal

Why Pay For Performance SEO Is Really Too Good To Be True (Provided You're ...
Search Engine Journal
SEO has a bad, bad name around the world and is, for many, a snake oil term – that's one of the reasons why some of us are gradually shifting to inbound marketing, which encompasses SEO and other free traffic-driving factors such as content and social ...

and more »


Startups: Forget SEO, Play Lois Lane
ClickZ
Having been a part of three startup organizations myself, I can say with confidence that startups shouldn't spend a large portion of the marketing budget on SEO. Instead, a better strategy for enhancing your company's presence would be to maintain a ...



LOCUS-T Brings Improved Local SEO Services to Malaysia Businesses
eReleases (press release)
PETALING JAYA, Malaysia, July 19, 2013 /PRNewswire/ — SEO is important to any kind of online business sites and therefore LOCUS-T has come up with better local SEO services that can offer better options to local Malaysia businesses. LOCUS-T believes ...

and more »


Notice: Undefined index: TITLE in /var/www/vhosts/harmonyhollow.net/httpdocs/webmaster-resources/seo/inc/rss.inc on line 103
Google News

Home | Site Map
© 2008