Robots.txt is a text based file that is created by the webmasters to instruct search engine crawlers or web robots on how to crawl and index. Mainly this is used to direct them on what pages they are allowed to go and what pages they are restricted.
Robots.txt is also known as robots exclusion protocol (REP) and this file is commonly found in the root folder of a domain (e.g. www.domain.com/robots.txt).
As an SEO Expert this has been widely used to help search engine crawlers.
What are the terms we should be familiar of when handling Robots.txt?
This pertains to the web robots or search engine crawlers. We can specifically call each search engine crawlers out or for most SEOs they simply use asterisk (*) that means calling all crawlers.
This pertains to what specific location you want the user-agent not to crawl, this could be specific file or folder.
Get familiar with the type of web robots that can crawl your website as when a conditions requires to such as
Disallowing or excluding a single web robot in crawling your site
Allowing only a single web robot to crawl your site
We can also exclude parts of the server folder not to get crawled and indexed such as
Also if you have a sitemap XML page, this can be included like this
Example of a Robots.txt page is this
What about Meta Robots Tag?
This is also a best option in handling Web Robots as this is done in the header area specifically meta tags where we can do the following
<meta name=”robots” content=”noindex,nofollow”>
This means you are not allowing the crawlers to index or crawl the page and also follow the links that were created.
<meta name=”robots” content=”index,nofollow”>
This means you are allowing the page to be indexed or crawled but not to follow the links created.
<meta name=”robots” content=”index,follow”>
This means you are allowing the page to be indexed or crawled and follow the links created.