Discussion:
Some sort of "pollution" between two Virtual Hosts on the same machine, causes Google to look on site A for files on site B
(too old to reply)
Dr. David Kirkby
2018-07-03 12:24:27 UTC
Permalink
I'm running Apache on a Debian 9 system.

***@localhost:~# apache2ctl -v
Server version: Apache/2.4.25 (Debian)
Server built: 2018-03-31T08:47:16

on a virtual private server, with one IP address. I have about 6 virtual
hosts on there. One is

https://www.g8wrb.org/

which has a directory 'data", with valve data sheets on it.

So for example, there's a file
https://www.g8wrb.org/data/Eimac/4CX10000D.pdf

If Googlebot goes around looking for that it will find it. The problem is,
Googlebot is looking on another domain

https://www.kirkbymicrowave.co.uk/

for the same files, so for example, you can see the last line of the logs
below, that googlebot is looking for

/data/Eimac/4CX10000D.pdf

on the https://www.kirkbymicrowave.co.uk/ domain, despite the fact that the
file has never been on that website. It seems as though Google is mixing
the two sites up in some way, and hunting for files on one domain, that
should (and are) be on another domain hosted on the same server.

Needless to say, when I look with Google Analytics, I see a ton of 404
errors, as Google can't find the files it is looking for on
https://www.kirkbymicrowave.co.uk/, which is hardly surprising, as they
were never there.

Can anyone explain what might be happening? I have posted the four
VirtualHosts related to the https://www.kirkbymicrowave.co.uk/ domain
below. There are 4, to cover 4 possibilities, to cover of going to the
domain without the www, and with www, and also to a non secure version on
port 80, and a secure version on port 443.

access-kirkbymicrowave.co.uk.log.6:66.249.66.66 - - [16/Jun/2018:06:11:01
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/3CX10000H3.pdf
HTTP/1.1" 404 575 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.68 - - [16/Jun/2018:06:14:45
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/AB5.pdf
HTTP/1.1" 404 568 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.70 - - [16/Jun/2018:06:22:27
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/4CX5000R.pdf
HTTP/1.1" 404 573 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.64 - -
[28/Jun/2018:22:32:18 +0000] "GET /data/Eimac/4-125A.pdf HTTP/1.1" 404 6325
"-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html
)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.67 - -
[28/Jun/2018:22:45:01 +0000] "GET /data/Eimac/4CX10000D.pdf HTTP/1.1" 404
6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"


<VirtualHost *:443>
# The ServerName directive sets the request scheme, hostname and port
that
# the server uses to identify itself. This is used when creating
# redirection URLs. In the context of virtual hosts, the ServerName
# specifies what hostname must appear in the request's Host: header to
# match this virtual host. For the default virtual host (this file) this
# value is not decisive as it is used as a last resort host regardless.
# However, you must set it for any further virtual host explicitly.
ServerName www.kirkbymicrowave.co.uk

ServerAdmin ***@kirkbymicrowave.co.uk
DocumentRoot /var/www/html/kirkbymicrowave.co.uk

SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI "\.(?:gif|jpe?g|png)$" no-gzip

# Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the loglevel for particular
# modules, e.g.
#LogLevel info ssl:warn

ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-SSL.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-SSL.log
combined

SSLEngine on
SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key
SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
# For most configuration files from conf-available/, which are
# enabled or disabled at a global level, it is possible to
# include a line for only one particular virtual host. For example the
# following line enables the CGI configuration for this host only
# after it has been globally disabled with "a2disconf".
#Include conf-available/serve-cgi-bin.conf

ErrorDocument 404 /error-pages/404.html
ErrorDocument 410 /error-pages/410.html
ErrorDocument 500 /error-pages/500.html
ErrorDocument 503 /error-pages/503.html
</VirtualHost>

<VirtualHost *:80>
# Redirect www.kirkbymicrowave.co.uk on port 80 to the https site.
ServerName www.kirkbymicrowave.co.uk
ServerAdmin ***@kirkbymicrowave.co.uk
ErrorLog ${APACHE_LOG_DIR}/error-www.kirkbymicrowave.co.uk-port-80.log
CustomLog
${APACHE_LOG_DIR}/access-www.kirkbymicrowave.co.uk-port-80.log combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>

<VirtualHost *:80>
# Redirect kirkbymicrowave.co.uk on port 80 to the https site.
ServerName kirkbymicrowave.co.uk
ServerAdmin ***@kirkbymicrowave.co.uk
ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-80.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-80.log
combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>


<VirtualHost *:443>
# Redirect kirkbymicrowave.co.uk on port 443 to the www. site.
ServerName kirkbymicrowave.co.uk
SSLEngine on
SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key
SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
ServerAdmin ***@kirkbymicrowave.co.uk
ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-443.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-443.log
combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>
Matt Sicker
2018-07-03 14:50:40 UTC
Permalink
I believe you have the wrong mailing list. Take a look at <
http://httpd.apache.org/lists.html> for the proper user list for Apache
HTTP Server.

On Tue, 3 Jul 2018 at 07:24, Dr. David Kirkby <
Post by Dr. David Kirkby
I'm running Apache on a Debian 9 system.
Server version: Apache/2.4.25 (Debian)
Server built: 2018-03-31T08:47:16
on a virtual private server, with one IP address. I have about 6 virtual
hosts on there. One is
https://www.g8wrb.org/
which has a directory 'data", with valve data sheets on it.
So for example, there's a file
https://www.g8wrb.org/data/Eimac/4CX10000D.pdf
If Googlebot goes around looking for that it will find it. The problem is,
Googlebot is looking on another domain
https://www.kirkbymicrowave.co.uk/
for the same files, so for example, you can see the last line of the logs
below, that googlebot is looking for
/data/Eimac/4CX10000D.pdf
on the https://www.kirkbymicrowave.co.uk/ domain, despite the fact that the
file has never been on that website. It seems as though Google is mixing
the two sites up in some way, and hunting for files on one domain, that
should (and are) be on another domain hosted on the same server.
Needless to say, when I look with Google Analytics, I see a ton of 404
errors, as Google can't find the files it is looking for on
https://www.kirkbymicrowave.co.uk/, which is hardly surprising, as they
were never there.
Can anyone explain what might be happening? I have posted the four
VirtualHosts related to the https://www.kirkbymicrowave.co.uk/ domain
below. There are 4, to cover 4 possibilities, to cover of going to the
domain without the www, and with www, and also to a non secure version on
port 80, and a secure version on port 443.
access-kirkbymicrowave.co.uk.log.6:66.249.66.66 - - [16/Jun/2018:06:11:01
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/3CX10000H3.pdf
HTTP/1.1" 404 575 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.68 - - [16/Jun/2018:06:14:45
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/AB5.pdf
HTTP/1.1" 404 568 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.70 - - [16/Jun/2018:06:22:27
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/4CX5000R.pdf
HTTP/1.1" 404 573 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.64 - -
[28/Jun/2018:22:32:18 +0000] "GET /data/Eimac/4-125A.pdf HTTP/1.1" 404 6325
"-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html
)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.67 - -
[28/Jun/2018:22:45:01 +0000] "GET /data/Eimac/4CX10000D.pdf HTTP/1.1" 404
6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
<VirtualHost *:443>
# The ServerName directive sets the request scheme, hostname and port
that
# the server uses to identify itself. This is used when creating
# redirection URLs. In the context of virtual hosts, the ServerName
# specifies what hostname must appear in the request's Host: header to
# match this virtual host. For the default virtual host (this file) this
# value is not decisive as it is used as a last resort host regardless.
# However, you must set it for any further virtual host explicitly.
ServerName www.kirkbymicrowave.co.uk
DocumentRoot /var/www/html/kirkbymicrowave.co.uk
SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI "\.(?:gif|jpe?g|png)$" no-gzip
# Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the loglevel for particular
# modules, e.g.
#LogLevel info ssl:warn
ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-SSL.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-SSL.log
combined
SSLEngine on
SSLCertificateKeyFile
/etc/ssl/private/www_kirkbymicrowave_co_uk.key
SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
# For most configuration files from conf-available/, which are
# enabled or disabled at a global level, it is possible to
# include a line for only one particular virtual host. For example the
# following line enables the CGI configuration for this host only
# after it has been globally disabled with "a2disconf".
#Include conf-available/serve-cgi-bin.conf
ErrorDocument 404 /error-pages/404.html
ErrorDocument 410 /error-pages/410.html
ErrorDocument 500 /error-pages/500.html
ErrorDocument 503 /error-pages/503.html
</VirtualHost>
<VirtualHost *:80>
# Redirect www.kirkbymicrowave.co.uk on port 80 to the https site.
ServerName www.kirkbymicrowave.co.uk
ErrorLog ${APACHE_LOG_DIR}/error-www.kirkbymicrowave.co.uk-port-80.log
CustomLog
${APACHE_LOG_DIR}/access-www.kirkbymicrowave.co.uk-port-80.log combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>
<VirtualHost *:80>
# Redirect kirkbymicrowave.co.uk on port 80 to the https site.
ServerName kirkbymicrowave.co.uk
ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-80.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-80.log
combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>
<VirtualHost *:443>
# Redirect kirkbymicrowave.co.uk on port 443 to the www. site.
ServerName kirkbymicrowave.co.uk
SSLEngine on
SSLCertificateKeyFile
/etc/ssl/private/www_kirkbymicrowave_co_uk.key
SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-443.log
CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-443.log
combined
Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>
--
Matt Sicker <***@gmail.com>
Loading...