Munax Search Technology
With Munax you can forget the struggle with organizing your
documents. The same goes for everything else that can be indexed,
i.e. images, videos and sound files, executables as well as source
code,
compressed files and email addresses. You can spread the data freely
on your hard disks, or over the computers in your office or over the world wide
web. Or a mix of all three.
Below, we disclose just a few of the things that makes the Munax
Search Technology unique.
Key features include
· Extreme performance
Munax utilizes the hardware to its maximum to deliver maximum
performance, both when indexing and querying. No matter if Munax is
installed on one or several computers. Also, you can easily let the
Munax system grow as your business grows.
· Unlimited scaling
The modular subsystem architecture of Munax provides linear
scalability across machines and subsystems. A Munax system can be
scaled from a single machine running Munax in automatic mode, up to
any number of machines. The linear scaling capability is provided
both when indexing and when querying and Munax scales linearly to
handle any amount of queries.
· Easy to manage
It takes only one person to manage a Munax system distributed over
thousands of computers, thus minimizing the management cost. Munax
can also be unattended by setting it in Automatic Mode.
· Equalizer ranking (EQR)
Munax combines several state-of-the-art ranking algorithms to
produce a final ranking; EQR algorithms use, in real-time, hundreds
of parameters together with the index data to produce the very best,
most relevant, search results first. No matter if the search is done
on a local network or the World Wide Web.
· Any type of information is one click away (composite search)
Munax is the first search engine to offer composite search. With this
feature, you do not have to be explicit about what you are searching
for (documents, images, sound files ...). Just type in your query
and Munax will find all the various documents and objects and list
them on the same result page, i.e. a composite search result.
· Data protection
Munax is designed to limit system vulnerability. Through the use of
security features, you can have all your documents encrypted
and also set rules for which machines and users that may access the
search engine. This way, your data is secured and can never be
accessed by hackers or other intruders.
· Automated backup
Site Search or Corporate search, it does not matter. If you want the
search system to backup your important pages and documents, it will
do so.
· Internet distributable
You may choose to combine Munax Internet distribution and Munax
mirroring which makes it almost impossible to destroy the system.
· Wide range of file types for indexing supported
Full indexing, for instance: htm html shtm shtml jhtm asp php php3
pdf ps doc xls ppt rtf wp wp5 wp6 wpd txt c cpp and h. More types of
documents for full-indexing are is easily added.
Meta indexing, for instance: gif, jpg, tga, bmp, iff, img, jif, mac,
msp, pcx, pic, tif, ico, jpe, mp3, wav, ram, snd, mp4, aif, mid, vqf,
la1, lav, mp2, avi, mpg, mpeg, rm, qt, asx, mov, fli, flc, eps, wri,
asc, fmk, for, zip, gzip, tar, arc, lzh, sit, rar, arj, dd, tgz, lha,
exe, hqx, dll, vbs, vxd, bat, cmd, class, jar, java, jav and email
addresses.
Munax knows what type of files these are and groups them
accordingly.
External interfaces
Munax is designed to be easily adapted to each customer situation.
This provides the customer with the right solution for their current
needs as well as the ability to expand the system capabilities as
their requirements change.

Distributing execution power
Munax execution is logically distributed over subsystems, not
machines. One machine can host several subsystems and/or one
subsystem can span over several machines. This way, MUNAX can be
distributed freely and each machine's execution power can be
utilized to its maximum. For instance, one very powerful machine can
have three full subsystems running while another machine may have
only a dictionary server running.

Deploying Munax
Munax may be deployed and scaled in many ways, but usually 4 basic
configurations are recommended:
The single onemachine system consists of one machine only, having
two databases, one for queries and the other for update-indexing.
The machine has the Munax web server integrated, i.e. the Front-End.
The double onemachine system is two machines, each having one
database. One of the machines is used for queries while the other is
busy update-indexing. Both machines have the Munax web server
integrated. Else, if another machine is added to run the Munax web
server (the Front-End), this system becomes a double onemachine
system with Front-End. Thus, the database machines become the
Back-Ends. A double onemachine system provides higher performance as
well as a higher redundancy.
In a single multimachine system, we have distributed Munax over
several machines and one (or every) machine also acts as a
Front-End.
In the figure below, the double multimachine system with Front-Ends
is pictured. It provides very high performance and redundancy. One
of the Back-End systems is used for queries while the other is busy
update-indexing. The computers in the picture are located on a local
area network but Munax can just as well be distributed on computers
spread over the Internet.
In all systems, Munax can be managed from the Munax Management
console program on a single PC, from anywhere.

Search features
Words & operators: individual words (64 at maximum) combined with
phrases, + and - to include or exclude words, automatic AND, NEAR is
obsolete (Munax automatically ranks by proximity), # hashmark before
words to find words in meta information tags.
Composite search provides the feature of not having to be explicit
about what type of information (documents, images, sound files ...)
to search for. Just type in the query and Munax will find all the
various documents and objects and list them on the same result page,
i.e. a composite search result.
Multimedia pre-view / pre-listen, i.e. audio can be pre-listen to
and videos can be pre-viewed, to make a quick decision on if this is
what the visitor is looking for.
Image magnification. Any image in the search results can be
instantly magnified.
Search within a site across top domains. This means that a search
can be done, for instance, within all Microsoft sites without
stating the top domain.
Within top domain. Stating a specific top domain narrows the search
within that top domain only. For instance, if the site = Microsoft
and top domain = de, the search will be done only within the site
microsoft.de
Offensive content exclusion.
Robots excluded search. Munax crawls robots-excluded documents too,
if parametized that way. Then it is up to the visitor of Munax to
decide if he/she wants to view such documents.
Select document types: This includes the selection of any type, all
non-html types, all html types, or any other type of document that
Munax has been parametized to crawl.
Language selection: Select to show documents with the specific
language.
Ranking combinations: If enabled by the system manager, a visitor
can choose which ranking algorithms Munax should combine.
Combinations include Relevance, docVote, Proximity and Multimedia.
Multimedia ranking: If selected, Munax will high-rank documents
(html pages) that have rich multimedia content.
Dates: Search can be done on documents before or after a given date,
as well as between dates.
Demand objects: Check boxes are provided so the visitor can demand
that the search shall only be done for pages that contain links of
specific types, i.e. images, sound, software, movie files, other
type of documents, compressed files and/or email addresses.
Object types: When explicitly searching for objects, i.e. images,
sound, software, movie files, other type of documents or compressed
files, their file extension can be specified.
Object peek: For html searches, Munax displays tiny icons to show
what link objects there are on the html page, i.e. images, sound,
software, movie files, other type of documents, compressed files
and/or email addresses.
Document type: For any given query, Munax displays the type of
document and its appropriate extension.
View doc: The visitor can click on this link to request Munax to
fetch the document from the document repository instead of fetching
it from the web. This way, "binary" documents like PDF and word
documents can be viewed directly in the browser.
RTB: The visitor can click this link to view html pages with html
tags stripped away.
Query result word highlighting: For each query result, the searched
words are high-lighted.
View document words highlighting: When clicking View doc (to order
Munax to fetch a document from the document repository) all searched
words will be colorized in the viewed document.
Performance
The bandwidth utilization = the sum of all bandwidths. For instance,
for a MUNAX system setup on 3000 computers on the internet, and each
computer using ADSL 2.5Mbps, the total bandwidth will be 7.5Gbps.
For a corpus of 6 billion docs (average size 15Kbytes) this means a
(theoretical) total time of 34 hours to crawl & index the total
corpus.
|