屏蔽恶意蜘蛛

屏蔽恶意蜘蛛

主要:

if ($http_user_agent ~ “hubspot|CCBot|VelenPublicWebCrawler|Konturbot|my-tiny-bot|eiki|webmeup|ExtLinksBot|Go-http-client|Python|ZoominfoBot|MegaIndex.ru|MauiBot|Amazonbot|ds-robot|intelx.io|coccocbot|FeedDemon|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|HttpClient|MJ12bot|DotBot|heritrix|Bytespider|BLEXBot|Ezooms|JikeSpider|Barkrowler|InfoTigerBot|SemrushBot|DuckDuckGo-Favicons-Bot|^$” ) {

return 403;

}

小蜘蛛:

if ($http_user_agent ~ “Phpzhanqun|HostHarvest|python-requests|^$” ) {

return 403;

}

Amazonbot:Amazonbot is Amazon’s web crawler used to improve our services, such as enabling Alexa to answer even more questions for customers. Amazonbot respects standard robots.txt rules. 可以屏蔽

Go-http-client:这个是 是阿里云(或腾讯云 )的全站加速 为了确定最优线路用的蜘蛛,也可能是go语言制作的http客户端,可能其它程序抓取的(https://www.cnblogs.com/rxbook/p/15167301.html);不是正常浏览器,暂作屏蔽。

Bytespider: 字节跳动的蜘蛛,可能为了迅速建立数据库,抓取频率过高。海外市占率低,暂时屏蔽,以后要放出来。

Pro Sitemaps Generator: pro-sitemaps.com 一个生成站点地图的工具,会给网站增加负担,不需要都加,碰到了加就可以。