robots.txt 的完整语法

> 您可[在此处找到 robots.txt 的完整语法][1]。robots.txt 语法的某些部分比较棘手，需要重点学习，因此请一定仔细阅读整个文档。

# 实用的 robots.txt 规则
> 下面是一些常见的实用 robots.txt 规则：

<table class="nice-table">
 <tbody>
 <tr>
 <th class="wide">规则</th>
 <th class="wide" style="width:50%">示例</th>
 </tr>
 <tr>
 <td class="tab0">禁止抓取整个网站。 请注意，在某些情况下，Google 即使未抓取网站的网址，仍可能会将其编入索引。注意：这不适用于各种 AdsBot 抓取工具，此类抓取工具必须明确指定。</td>
 <td class="tab0">
 <pre>User-agent: *
Disallow: /
</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">禁止抓取某一目录及其内容（在目录名后面添加一道正斜线）。请注意，若想禁止访问私密内容，则不应使用 robots.txt，而应改用适当的身份验证机制。对于 robots.txt 文件所禁止抓取的网址，Google 仍可能会在不进行抓取的情况下将其编入索引；另外，由于 robots.txt 文件可供任何人随意查看，因此可能会泄露您的私密内容的位置。</td>
 <td class="tab0">
 <pre>User-agent: *
Disallow: /calendar/
Disallow: /junk/
</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">仅允许使用某一抓取工具</td>
 <td class="tab0">
 <pre>User-agent: Googlebot-news
Allow: /

User-agent: *
Disallow: /
</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">允许使用除某一抓取工具以外的其他所有抓取工具</td>
 <td class="tab0">
 <pre>User-agent: Unnecessarybot
Disallow: /

User-agent: *
Allow: /

</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">
 禁止抓取某一网页（在正斜线后面列出网页）：
 </td>
 <td class="tab0">
 <pre>Disallow: /private_file.html</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">
 禁止 Google 图片访问某一特定图片：
 </td>
 <td class="tab0">
 <pre>User-agent: Googlebot-Image
Disallow: /images/dogs.jpg</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">
 禁止 Google 图片访问您网站上的所有图片：
 </td>
 <td class="tab0">
 <pre>User-agent: Googlebot-Image
Disallow: /</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">
 禁止抓取某一特定类型的文件（例如 <code>.gif</code>）：
 </td>
 <td class="tab0">
 <pre>User-agent: Googlebot
Disallow: /*.gif$</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">
 禁止抓取整个网站，但允许在这些网页上显示 AdSense 广告（禁止使用除 Mediapartners-Google 以外的所有网页抓取工具）。这种方法会阻止您的网页显示在搜索结果中，但 Mediapartners-Google 网页抓取工具仍能分析这些网页，以确定要向您网站上的访问者显示哪些广告。
 </td>
 <td class="tab0">
 <pre>User-agent: *
Disallow: /

User-agent: Mediapartners-Google
Allow: /</pre>
 </td>
 </tr>
 <tr>
 <td class="tab0">匹配以某一特定字符串结尾的网址 - 需使用美元符号 (<code>$</code>)。例如，示例代码会禁止访问以 <code>.xls</code> 结尾的所有网址：</td>
 <td class="tab0">
 <pre>User-agent: Googlebot
Disallow: /*.xls$
</pre>
 </td>
 </tr>
 </tbody>
</table>

> 转载自：https://support.google.com/webmasters/answer/6062596?hl=zh-Hans&ref_topic=6061961

[1]: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

添加新评论

最新文章

最近回复

分类

归档

其它