Munax Search Technology


With Munax you can forget the struggle with organizing your documents. The same goes for everything else that can be indexed, i.e. images, videos and sound files, executables as well as source code, compressed files and email addresses. You can spread the data freely on your hard disks, or over the computers in your office or over the world wide web. Or a mix of all three.

Below, we disclose just a few of the things that makes the Munax Search Technology unique.



Key features include

· Extreme performance

Munax utilizes the hardware to its maximum to deliver maximum performance, both when indexing and querying. No matter if Munax is installed on one or several computers. Also, you can easily let the Munax system grow as your business grows.

· Unlimited scaling

The modular subsystem architecture of Munax provides linear scalability across machines and subsystems. A Munax system can be scaled from a single machine running Munax in automatic mode, up to any number of machines. The linear scaling capability is provided both when indexing and when querying and Munax scales linearly to handle any amount of queries.

· Easy to manage

It takes only one person to manage a Munax system distributed over thousands of computers, thus minimizing the management cost. Munax can also be unattended by setting it in Automatic Mode.

· Equalizer ranking (EQR)

Munax combines several state-of-the-art ranking algorithms to produce a final ranking; EQR algorithms use, in real-time, hundreds of parameters together with the index data to produce the very best, most relevant, search results first. No matter if the search is done on a local network or the World Wide Web.

· Any type of information is one click away (composite search)

Munax is the first search engine to offer composite search. With this feature, you do not have to be explicit about what you are searching for (documents, images, sound files ...). Just type in your query and Munax will find all the various documents and objects and list them on the same result page, i.e. a composite search result.

· Data protection

Munax is designed to limit system vulnerability. Through the use of security features, you can have all your documents encrypted and also set rules for which machines and users that may access the search engine. This way, your data is secured and can never be accessed by hackers or other intruders.

· Automated backup

Site Search or Corporate search, it does not matter. If you want the search system to backup your important pages and documents, it will do so.

· Internet distributable

You may choose to combine Munax Internet distribution and Munax mirroring which makes it almost impossible to destroy the system.

· Wide range of file types for indexing supported

Full indexing, for instance: htm html shtm shtml jhtm asp php php3 pdf ps doc xls ppt rtf wp wp5 wp6 wpd txt c cpp and h. More types of documents for full-indexing are is easily added.

Meta indexing, for instance: gif, jpg, tga, bmp, iff, img, jif, mac, msp, pcx, pic, tif, ico, jpe, mp3, wav, ram, snd, mp4, aif, mid, vqf, la1, lav, mp2, avi, mpg, mpeg, rm, qt, asx, mov, fli, flc, eps, wri, asc, fmk, for, zip, gzip, tar, arc, lzh, sit, rar, arj, dd, tgz, lha, exe, hqx, dll, vbs, vxd, bat, cmd, class, jar, java, jav and email addresses.

Munax knows what type of files these are and groups them accordingly.



External interfaces

Munax is designed to be easily adapted to each customer situation. This provides the customer with the right solution for their current needs as well as the ability to expand the system capabilities as their requirements change.






Distributing execution power

Munax execution is logically distributed over subsystems, not machines. One machine can host several subsystems and/or one subsystem can span over several machines. This way, MUNAX can be distributed freely and each machine's execution power can be utilized to its maximum. For instance, one very powerful machine can have three full subsystems running while another machine may have only a dictionary server running.






Deploying Munax

Munax may be deployed and scaled in many ways, but usually 4 basic configurations are recommended:

The single onemachine system consists of one machine only, having two databases, one for queries and the other for update-indexing. The machine has the Munax web server integrated, i.e. the Front-End.

The double onemachine system is two machines, each having one database. One of the machines is used for queries while the other is busy update-indexing. Both machines have the Munax web server integrated. Else, if another machine is added to run the Munax web server (the Front-End), this system becomes a double onemachine system with Front-End. Thus, the database machines become the Back-Ends. A double onemachine system provides higher performance as well as a higher redundancy.

In a single multimachine system, we have distributed Munax over several machines and one (or every) machine also acts as a Front-End.

In the figure below, the double multimachine system with Front-Ends is pictured. It provides very high performance and redundancy. One of the Back-End systems is used for queries while the other is busy update-indexing. The computers in the picture are located on a local area network but Munax can just as well be distributed on computers spread over the Internet.

In all systems, Munax can be managed from the Munax Management console program on a single PC, from anywhere.







Search features

Words & operators: individual words (64 at maximum) combined with phrases, + and - to include or exclude words, automatic AND, NEAR is obsolete (Munax automatically ranks by proximity), # hashmark before words to find words in meta information tags.

Composite search provides the feature of not having to be explicit about what type of information (documents, images, sound files ...) to search for. Just type in the query and Munax will find all the various documents and objects and list them on the same result page, i.e. a composite search result.

Multimedia pre-view / pre-listen, i.e. audio can be pre-listen to and videos can be pre-viewed, to make a quick decision on if this is what the visitor is looking for.

Image magnification. Any image in the search results can be instantly magnified.

Search within a site across top domains. This means that a search can be done, for instance, within all Microsoft sites without stating the top domain.

Within top domain. Stating a specific top domain narrows the search within that top domain only. For instance, if the site = Microsoft and top domain = de, the search will be done only within the site microsoft.de

Offensive content exclusion.

Robots excluded search. Munax crawls robots-excluded documents too, if parametized that way. Then it is up to the visitor of Munax to decide if he/she wants to view such documents.

Select document types: This includes the selection of any type, all non-html types, all html types, or any other type of document that Munax has been parametized to crawl.

Language selection: Select to show documents with the specific language.

Ranking combinations: If enabled by the system manager, a visitor can choose which ranking algorithms Munax should combine. Combinations include Relevance, docVote, Proximity and Multimedia.

Multimedia ranking: If selected, Munax will high-rank documents (html pages) that have rich multimedia content.

Dates: Search can be done on documents before or after a given date, as well as between dates.

Demand objects: Check boxes are provided so the visitor can demand that the search shall only be done for pages that contain links of specific types, i.e. images, sound, software, movie files, other type of documents, compressed files and/or email addresses.

Object types: When explicitly searching for objects, i.e. images, sound, software, movie files, other type of documents or compressed files, their file extension can be specified.

Object peek: For html searches, Munax displays tiny icons to show what link objects there are on the html page, i.e. images, sound, software, movie files, other type of documents, compressed files and/or email addresses.

Document type: For any given query, Munax displays the type of document and its appropriate extension.

View doc: The visitor can click on this link to request Munax to fetch the document from the document repository instead of fetching it from the web. This way, "binary" documents like PDF and word documents can be viewed directly in the browser.

RTB: The visitor can click this link to view html pages with html tags stripped away.

Query result word highlighting: For each query result, the searched words are high-lighted.

View document words highlighting: When clicking View doc (to order Munax to fetch a document from the document repository) all searched words will be colorized in the viewed document.



Performance

The bandwidth utilization = the sum of all bandwidths. For instance, for a MUNAX system setup on 3000 computers on the internet, and each computer using ADSL 2.5Mbps, the total bandwidth will be 7.5Gbps. For a corpus of 6 billion docs (average size 15Kbytes) this means a (theoretical) total time of 34 hours to crawl & index the total corpus.