Apache error log output: Difference between revisions

From Genecats
Jump to navigationJump to search
(Adding hints on log locations)
(Adding Lou's tips about seeing an IP's actions)
Line 31: Line 31:


Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries.  
Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries.  
===Example of accessing /hive/data/inside/wwwstats/RR/ for IP===
Error log location for hgw2:
/hive/data/inside/wwwstats/RR/2019/hgw2
The last two directories are for $year and $machine
Within that directory:
<pre>
-rw-rw-r--  1 hiram genecats 498M Mar 28 10:40 access_log
-rw-rw-r--  1 hiram genecats 891M Mar 28 10:40 error_log
-rw-r--r--  1 hiram protein  54M Jan 14 03:10 access_log.20190106.gz
-rw-rw-r--  1 hiram genecats  48M Jan 20 03:30 access_log.20190114.gz
-rw-rw-r--  1 hiram genecats  58M Jan 28 03:41 access_log.20190120.gz
...
</pre>
There is an '''access_log''', which is the latest log (within the last week). Same for the error_log. Once a log is >week old, it gets compressed and the file name is 2019-month-day(start of week). Ex.  access_log.20190120.gz.
If you wanted to research what a recent offending IP was doing, you would check the different machines, in this case we saw there is some activity on hgw2, and do the following:
<pre>
$cat access_log | awk '{print $1}' | sort | uniq -c | sort -r | head
  33386 54.82.172.170
  28818 130.91.194.123
  16993 130.223.121.12 (Offender) (16993 hits on hgw2)
</pre>
====Ex. Summarizing specific 130.223.121.12  IP actions====
<pre>
$cat access_log | grep "130.223.121.12" | awk '{print $7}' | cut -f 3 -d '/' | cut -f 1 -d "?" | sort | uniq -c | sort -r | head
  16905 hgTables
    18 hgt
    14 hgTracks
</pre>
So this 130.223.121.12  IP is hammering hgTables.
====Ex. Checking specific 130.223.121.12 IP  requests====
<pre>
$cat access_log | grep "130.223.121.12" | head -n 3
130.223.121.12 - - [25/Mar/2019:08:12:27 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 1222212 "-" "Python-urllib/2.7" 1257605 microseconds
130.223.121.12 - - [25/Mar/2019:08:12:30 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 167045 "-" "Python-urllib/2.7" 2063636 microseconds
130.223.121.12 - - [25/Mar/2019:08:12:31 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 681933 "-" "Python-urllib/2.7" 2170569 microseconds
</pre>
The results show  Python-urllib/2.7 which shows they are using python to programmatically query, and the POST is a different way to input parameters without using the URL, so we don't know what they're searching for. We do see that they are doing multiple queries a minute without waiting for a response, which is more evident as they currently have an almost 60 minute bottleneck.


[[Category:Browser QA]]
[[Category:Browser QA]]
[[Category:Browser QA CGI]]
[[Category:Browser QA CGI]]

Revision as of 18:19, 28 March 2019

Here are some examples of errors we often see but that are of no concern:

This is seen right after a push of new CGIs and is caused by browsers having a stale version of the CGIs/css file. A bunch of these will show up right after the push, but they should subside after several minutes as browser grab new versions.

 Cannot find correct version of file 'HGStyle.css'; this is due to an installation error, referer: http://hgw0.gi.ucsc.edu/
 File does not exist: /data/apache/htdocs/js/jquery.imgareaselect-v291.js, referer: http://genome.ucsc.edu/cgi-bin/hgTracks

We also see these often but they tend to right themselves as the load on the browser waxes and wanes.

 Too many popAbortHandlers, referer: http://genome.ucsc.edu/cgi-bin/hgTables?position=chr4:150237197-150248873&hgsid=353372405&refGene=pack&hgFind.matches=NM_023119,

How to look at logs

When you are on a machine you can look at the logs by following this file:

/usr/local/apache/logs/error_log

Older logs with names like this:

/usr/local/apache/logs/error_log.1
/usr/local/apache/logs/error_log.2

Example tail command:

tail -f /usr/local/apache/logs/error_log | grep -v "CGI_TIME\|trackLog"

How to find historic logs

If you wanted to look at historical stuff there is a place here on hive:

/hive/data/inside/wwwstats/RR/2018/
/hive/data/inside/wwwstats/RR/2019/

Let's say you wanted to look at specific events on Feb. 14th on the hgw1 machine for 2019:

zgrep "Feb 14" /hive/data/inside/wwwstats/RR/2019/hgw1/error_log.20190211.gz | less

Ask Matt for more ways to use these logs, such as building a temporary directory of simlinks to the logs for specific date ranges to do more complicated querries.


Example of accessing /hive/data/inside/wwwstats/RR/ for IP

Error log location for hgw2:

/hive/data/inside/wwwstats/RR/2019/hgw2

The last two directories are for $year and $machine

Within that directory:

-rw-rw-r--  1 hiram genecats 498M Mar 28 10:40 access_log
-rw-rw-r--  1 hiram genecats 891M Mar 28 10:40 error_log
-rw-r--r--  1 hiram protein   54M Jan 14 03:10 access_log.20190106.gz
-rw-rw-r--  1 hiram genecats  48M Jan 20 03:30 access_log.20190114.gz
-rw-rw-r--  1 hiram genecats  58M Jan 28 03:41 access_log.20190120.gz
...

There is an access_log, which is the latest log (within the last week). Same for the error_log. Once a log is >week old, it gets compressed and the file name is 2019-month-day(start of week). Ex. access_log.20190120.gz.


If you wanted to research what a recent offending IP was doing, you would check the different machines, in this case we saw there is some activity on hgw2, and do the following:

$cat access_log | awk '{print $1}' | sort | uniq -c | sort -r | head

  33386 54.82.172.170
  28818 130.91.194.123
  16993 130.223.121.12 (Offender) (16993 hits on hgw2)

Ex. Summarizing specific 130.223.121.12 IP actions

$cat access_log | grep "130.223.121.12" | awk '{print $7}' | cut -f 3 -d '/' | cut -f 1 -d "?" | sort | uniq -c | sort -r | head

  16905 hgTables
     18 hgt
     14 hgTracks

So this 130.223.121.12 IP is hammering hgTables.

Ex. Checking specific 130.223.121.12 IP requests

$cat access_log | grep "130.223.121.12" | head -n 3

130.223.121.12 - - [25/Mar/2019:08:12:27 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 1222212 "-" "Python-urllib/2.7" 1257605 microseconds
130.223.121.12 - - [25/Mar/2019:08:12:30 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 167045 "-" "Python-urllib/2.7" 2063636 microseconds
130.223.121.12 - - [25/Mar/2019:08:12:31 -0700] "POST /cgi-bin/hgTables? HTTP/1.1" 200 681933 "-" "Python-urllib/2.7" 2170569 microseconds

The results show Python-urllib/2.7 which shows they are using python to programmatically query, and the POST is a different way to input parameters without using the URL, so we don't know what they're searching for. We do see that they are doing multiple queries a minute without waiting for a response, which is more evident as they currently have an almost 60 minute bottleneck.