Dear site owner,
Your site might have been visited by our
crawlers, with network addresses in the
range of 82.99.30.2 - 82.99.30.73. Here is a
short FAQ answering some of the questions
you might have:
What is the name of your crawler ?
Our crawler does not have a "name", yet.
Instead it announces itself to be a standard
web browser, a "Mozilla 4.0" kind-of-browser
compatible with the browser Microsoft
Internet Explorer 6.0, running on the
Windows NT 5.1 operating system. The reasons
for this are: (a) Today, web servers are
intelligent enough to react on the type of
user agent. If our crawlers had a name, say
MunaxRob or something like that, many web
servers would not know about it and would
return junk or maybe nothing at all. (b) We
want the web server to return a page to us
where the page looks as close as possible to
a page that can be viewed with a standard
web browser. This, to create the best
possible indexing in our database and a
WYSIWYG experience for anybody that is
visiting our search engine.
It is true that many of todays search
engines are doing well with a name set to
'Robot' or something like that, but those
search engines are well known and site
owners have given the crawlers of those
search engines a chance to retrieve the best
possible information. We want to be given
the same chance.
How often do you visit my site ?
The period between each time our crawlers
will visit your site should be somewhere
between 15 minutes to several days.
Do you store the things you index ?
Unlike other search engines, we do not want
to steal the things you have on your site.
For instance, some search engines download
and convert your images and then display
them as thumbnails in their search results.
We just store the links to the images.
Additionally, since we just try to access
your images, not downloading them, we will
use practically nil of your bandwidth.
For pages, we do the same thing as many
other search engines, i.e. our crawlers
download and store a copy of the page in the
database cache.
For other things, like video and audio, we
follow the rules accepted on the web and
take only a small snapshot. For a video we
take 3 - 4 seconds and for audio about 16
seconds.
Why do you supply the URL http://www.munax.com/referer.htm
as 'Referer' ?
You might have set your web server to deny
access to things (images for instance) on
your site unless the Referer is a page on
your web site. This is why the crawler
access your site with a Referer page outside
your site; The crawler wants to know if we
will be denied linkage to the images on your
site. If yes, the crawler must set a low
rank value on, or remove, your image link
from the search index to avoid displaying a
broken/missing image in the search results.
Do you honour the robots.txt protocol ?
Yes we do. However, the crawler will always
(almost) fetch the first page of the site,
i.e. the page of the root URL "/". This is
for ranking calculation reasons. When we
leave beta state we will most likely change
this so the first page will be skipped too.
Also, if other sites links to multimedia on
your site, the crawler will index those
links, assuming that those links must be OK
to index since other sites are allowed to
link to your site.
The crawlers will ignore a robots.txt file
if it is not correctly written.
How do I exclude my site from being indexed
?
Remove NOSPAM from the email address info2@NOSPAMmunax.com
and send an email with the subject "Exclude
my site from indexing, code: 84jdur74ud". In
your email you should state the full URL of
the site. Also, note that others might want
to have your site excluded, so be sure to
use a correct senders email address. It
should have the same domain name as the site
you want to exclude.
Because of being in beta state and so many
things to do and so many requests to serve,
your site might not be excluded until the
next time we crawl & index the web.
Again... Please note that we are in beta
state. We try to correct things as soon as
possible and we are sorry for any
inconvenience our crawlers might have caused
you and your site.
|
|
|