Google is one of the largest search engines but Google also serves ads for many of our sites.
So what is a robots.txt file and how do you modify it so it works well with Google.
What is a robots.txt File
Robots are unmanned Internet tools that visit websites and try to read the content of and index the content of the site. For search engines they are interested in grabbing keywords and links within the site so they can be included in search results for their users.
To do this they read in the HTML content of the page using algorithms to weed out fake content. They will read the title of the page and of headlines and weight that against the rest of the content. They may also look at meta keywords and descriptions located in the header but many robots actually read your pages like a human would. Or at least they try to.
Robots used for advertising banners will read the page and feed an ad banner based on the content of the page. This often happens real time and is transparent to your visitors. This can place an extra load on your server because each page view will require a second, third, fourth or more hits to your pages just to serve your ads.
Ok so what can you do to control these bots? Well a robots.txt file can contain instructions that NICE bots will read and this will let you control how they browse through your site. However even some large search engine companies ignore this file and in that case you will need to setup a .htaccess rule to control the bots access to your site.
Since we are trying to make a nice robots.txt file for Google you should understand that their bots can and do change. Google will give you specific instructions when they change their settings that you can view from your AdSense account.
Setting Up The Robots.txt File
All you need to write a robots.txt file is notepad or other another pure txt file editor.
Start by creating an empty file named robots.txt
Add to that file the following:
User-agent: Mediapartners-Google*
Disallow:
Now save this file and upload it to your root website directory.
This is the main directory with your index.html file where people are pointed when they type in www.yoursitename.com in their browser.
This command will allow Google’s Bots to see your whole site. It should be on the top lines of your robots.txt file if you have other commands in your file.
You may also want to limit access to certain directories on your site from all bots.
In other pages we will cover how to block bad bots and how to use meta tags to control bots on your site.
We will also cover how to block bad bots with a .htaccess file.
Remember that controling who indexes your site is never perfect but it is something that you can do to optimize the way your visitors see your site in search engine indexes.