The following warnings occurred:
Warning [2] Undefined variable $unreadreports - Line: 26 - File: global.php(961) : eval()'d code PHP 8.1.2-1ubuntu2.14 (Linux)
File Line Function
/global.php(961) : eval()'d code 26 errorHandler->error
/global.php 961 eval
/showthread.php 28 require_once





× This forum is read only. As of July 23, 2019, the UserSpice forums have been closed. To receive support, please join our Discord by clicking here. Thank you!

  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tomfoolery Logging Bot hits
#1
Hello.

As I'm new, I'm still learning what I can about userspice. I haven't found anything that addresses this subject.

Issue: I've noticed in my log a constant logging of certain IPs and did some research and noticed they are the search engine web crawlers. I'd like the log to omit entries of known bots (google, amazon, etc.)

Thoughts: I was thinking of exploring the tomfool php file and support files to see if I could put an exception in but before hand, was interested in seeing if any others have done this already. Another thought is to explore code in the htaccess files in select directories to block bots from scanning those directories.

Your thoughts? Thank you for any insight you can provide on this. I just dont want my log bloated with bot hits.

JDM
  Reply
#2
We can definitely make a whitelist. I actually don't have that problem on my projects, so it could be some sort of server configuration option.

My thought is that we can add a table of ip addresses that don't show up there. Just make sure not to put your own!

It's going to be 5 weeks until I can get back to active development (I run a camp and we're right in the middle of camp season), but this is doable.
  Reply
#3
Very odd that you are getting forked with these crawlers! I haven't had any issues on either of my US projects...However, if it is happening, adding an omitting option would be great.
  Reply
#4
I'm an amateur programmer...still have lots to learn. Therefore, it could be something I could turn off on my server. Or, it could be that I'm on a European service provider's server.
  Reply
#5
I don't have any special configuration on my cPanel VPS other than ConfigServe firewall...I am wondering if this is why I don't get crawled...
  Reply
#6
you generaly do this in a robot.txt file however if that fails to stop them then you can use the following in your htaccess for the directory you dont want crawled

<pre>
Code:
RewriteEngine On

RewriteCond {3bc1fe685386cc4c3ab89a3f76566d8931e181ad17f08aed9ad73b30bf28114d}{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
</pre>


change the bots to the ones you want stopped and they arent case sensitive, you can change the error code to what you wish aswell, currently 403 access forbidden
  Reply
#7
Firestorm, unfortunately neither seems to work. I tried the htaccess first and it crashed my site. now I have the robot.txt file in a few folders and they seem to ignore it. however, i only have the basic entry:

User-agent: *
Disallow: /
  Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)