mnoGoSearch users can fully customize the search results look and feel by editing the template file search.htm which resides in the /etc/ directory of your mnoGoSearch installation.
mnoGoSearch template file is an HTML file with processing instruction blocks. The text outside of processing instruction blocks is printed as is. The text inside processing instruction blocks is interpreted as a C/C++ alike program.
Note: Template files are not limited to HTML format only. They can have any text based formats, e.g. XML, plain text, etc.
Example:
<?mnogosearch cout << "Content-Type: text/html; charset=utf-8\r\n\r\n"; ?> <html> <head> <title>Example</title> </head> <body> <?mnogosearch cout << "Hello!\n"; ?> </body> </html>
The fragments <?mnogosearch .. ?> are processing instruction blocks. They consist of statements.
The following statement types are supported:
<statement> ::= <expression-statement> | <compound-statement> | <selection-statement> | <iteration-statement> | <jump-statement>
mnoGoSearch supports single-line (C++ style) and multi-line (C style) comments inside processing instruction blocks:
<?mnogosearch cout << "Test\n"; // This is a single-line comment /* This is a multi-line comment. */ cout << "One more test\n"; ?>
mnoGoSearch supports the following built-in standard C data types:
Selection statements are the if..else blocks.
Note: The switch statement is not supported. It will be added in the future versions.
Section statements implement the following grammar:
<selection-statement> ::= if ( <expression> ) <statement> | if ( <expression> ) <statement> else <statement>
Iteration statements are the while, and do and for loops:
<iteration-statement> ::= while ( <expression> ) <statement> | do <statement> while ( <expression> ) | for ( {<declaration> | <expression>}? ; { <expression> }? ; {<expression>}? ) <statement>
Compound statements are code blocks inside curly brackets that can have optional variable declarations followed by optional nested statements.
<compound-statement> ::= '{' {<declaration>}* {<statement>}* '}'
<declaration> ::= <type-specifier> {<init-declarator>}* ; <init-declarator> ::= <identifier> | <identifier> = <initializer> <initializer> ::= <assignment-expression> <type-specifier> ::= char | int | double | <class-name>
Note: mnoGoSearch currently does not support the following data types: short, long, float, signed, unsigned, struct, union, enum, typedef.
Expression statements consist of variable assignments, function calls, as well as operator and method invocations.
<expression-statement> ::= { <expression> }? ;
<postfix-expression> ::= <primary-expression> | <primary-expression> <function-call-arguments> | <primary-expression> . <identifier> <function-call-arguments> | <primary-expression> ++ | <primary-expression> -- <function-call-argument> ::= <assignment-expression> <function-call-argument-list> ::= <function-call-argument> {, <function-call-argument>}* <function-call-arguments> ::= ( <function-call-argument-list>? )
Note: As of version 3.4.1, function arguments are limited to variables and literals only. Passing other kinds of expressions as function and method call arguments will be added in the future versions.
<unary-expression> ::= <postfix-expression> | <unary-operator> <cast-expression> | ++ <unary-expression> | -- <unary-expression> <unary-operator> ::= + | - | ~ | !
<multiplicative-expression> ::= <cast-expression> | <cast-expression> * <cast-expression> | <cast-expression> / <cast-expression> | <cast-expression> % <cast-expression>
<additive-expression> ::= <multiplicative-expression> | <multiplicative-expression> + <multiplicative-expression> | <multiplicative-expression> - <multiplicative-expression>
<shift-expression> ::= <additive-expression> | <additive-expression> << <additive-expression> | <additive-expression> >> <additive-expression>
<relational-expression> ::= <shift-expression> | <shift-expression> < <shift-expression> | <shift-expression> <= <shift-expression> | <shift-expression> >= <shift-expression> | <shift-expression> > <shift-expression>
<equality-expression> ::= <relational-expression> | <relational-expression> == <relational-expression> | <relational-expression> != <relational-expression>
<and-expression> ::= <equality-expression> | <equality-expression> & <equality-expression>
<logical-and-expression> ::= <or-expression> | <or-expression> && <or-expression>
<logical-or-expression> ::= <logical-and-expression> | <logical-and-expression> || <logical-and-expression>
<conditional-expression> ::= <logical-or-expression> | <logical-or-expression> ? <expression> : <conditional-expression>
<assignment-expression> ::= <conditional-expression> | <unary-expression> <assignment-operator> <assignment-expression> <assignment-operator> ::= = | *= | /= | %= | += | -= | <<= | >>= | &= | ^= | |=
_argc
int _argc();returns the number of command line arguments.
_argv
string _argv(int n);returns the n-th command line argument as string.
exit
int exit(int status);causes the program termination and returns the status value to the caller (e.g. to search.cgi).
getenv
string getenv(string name);returns the environment variable with the given and as string.
mnogosearch_version
string mnogosearch_version();returns the current version of mnoGoSearch as string.
rand
int rand();returns a random number.
sqrt
double sqrt(double number);returns the square root of the argument.
srand
int srand(int seed);sets the argument as a new seed for the random numbers to be returned by rand().
time
int time();returns the time value in seconds since the Epoch.
to_string
string to_string(int number); string to_string(double number);converts the given integer or double number to string.
Strings are objects that represent sequences of single-byte characters.
Strings have no any character set assumption. All methods of the string class operate in term of bytes.
string methods
clear
void clear();clears the current string value, so the string becomes empty.
length
int length() const;returns the length of the string.
stoi
int stoi() const;converts the string to an integer number.
stod
double stod() const;converts the string to a double number.
compare
int compare(const string &) const;compares a string to another string.
find
int find(const string &substring) const;searches the string for the first occurrence of a substring.
append
int append(const string &) const;appends another string to the end of the current value.
regex_substr
string regex_substr(const string &pattern, const string &format) const;creates a new string with the first occurrence of the regular expression pattern and/or its subpatterns mixed according to format.
The characters in format other than $ are printed as is.
A sequence consisting of a dollar sign $ followed by a digit 0-9 means a back-reference. The entire matched pattern can be referenced as $0, and its matched parenthesized subpatterns can be referenced as $1 to $9.
Example: Extract the schema and the host name from an URL
<?mnogosearch { string x= "http://www.host.com/path/file.ext"; string y= x.regex_substr("^[a-z]*://[^/]*", "$0"); cout << y << '\n'; } ?>This example will output:
http://www.host.com
Example: Extract the schema, the host name, and the path from an URL using subpatterns
<?mnogosearch { string x= "http://www.host.com/path/file.ext"; string y= x.regex_substr("^([a-z]*)://([^/]*)/(.*)", "$1 -- $2 -- $3"); cout << y << '\n'; } ?>This example will output:
http -- www.host.com -- path/file.ext
regex_cut
string regex_cut(const string &pattern) const;creates a new string with all occurrences of the regular expression pattern or its subpatterns removed.
If no occurrences are found, then the current string is returned as is.
If pattern has no parenthesized subpatterns, then the entire matched regular expression is removed.
If pattern has parenthesized subpatterns, then only the matched subpatterns are removed.
Example: Remove all letters from a string
<?mnogosearch { string x= "1111a2222b3333c4444d5555"; string y= x.regex_cut("[a-z]"); cout << y << '\n'; } ?>This example will output:
11112222333344445555
Example: Remove the entire query string from the URL
<?mnogosearch { string x= "http://www.host.com/path/file.ext?a=b&c=d&e=f&g=h"; string y= x.regex_cut("[?].*"); cout << y << '\n'; } ?>This example will output:
http://www.host.com/path/file.ext
Example: Remove the a and the e parameters from the query string of an URL
<?mnogosearch { string x= "http://www.host.com/path/file.ext?a=b&c=d&e=f&g=h"; string y= x.regex_cut("[?&](a=[^&]*&).*&(e=[^&]*&)"); cout << y << '\n'; } ?>
Note: This will remove the parameters a and e only if a precedes e.
http://www.host.com/path/file.ext?c=d&g=h
Example: Remove all parameters np from the query string of an URL
<?mnogosearch { string x= "http://localhost/search.cgi/search.htm?q=test&np=6&np=7&ps=10&np=8&np=9&np=10"; string y= x.regex_cut("[?&]np=[^&]*"); cout << y << '\n'; } ?>This example will output:
http://localhost/search.cgi/search.htm?q=test&ps=10
left
string left() const;returns the left byte of the current value as a new string value.
substr
string substr(int pos) const; string substr(int pos, int count) const;returns a substring that starts at byte position pos as a new string value. If the count parameter is not passed, then the string is copied until the end. Otherwise, not more than count bytes are copied.
Note: The first character is denoted by the pos value of 0.
lower
string lower() const;returns a new string value with all characters of the current string converted to lower case.
upper
string upper() const;returns a new string value with all characters of the current string converted to upper case.
pcase
string pcase() const;returns a new string value with all characters of the current string converted to lower case, while the leftmost character converted to upper case.
Example
<?mnogosearch { string x="abc ABC"; cout << x.pcase() << '\n'; } ?>This example will output:
Abc abc
urldecode
string urldecode() const;Returns a new string with sequences %## decoded to their byte values and plus signs (+) replaced to spaces.
Example
<?mnogosearch { string x="http://www.site.com/?q=%26%20%3c%20%3E%20%22+b+c"; string y= x.urldecode(); cout << y << '\n'; } ?>This example will output:
http://www.site.com/?q=& < > " b c
string htmlencode() const;Converts HTML special characters to entities.
urlencode
string urlencode() const;Returns a string with all unsafe characters replaced with a percent (%) sign followed by two hex digits and spaces replaced as plus (+) signs.
The unsafe character include:
base64encode
string base64encode() const;Returns a new string with base-64 encoded form of the current string value.
The ENV class represents an environment - a global context (configuration) in which to search the database created by indexer.
ENV methods
int addline(string line);Adds a new configuration line into the environment. Returns 0 on success or a non-zero error code on error.
Example: Add a single configuration line
<?mnogosearch { ENV env; if (env.addline("DBAddr mysql://root@localhost/test/")) { cout << env.errmsg() << "\n"; exit(1); } } ?>This example uses the command DBAddr to specify the address of the search database.
Example: Load configuration from a file
<?mnogosearch { ENV env; if (env.addline("include include.conf")) { cout << env.errmsg() << "\n"; exit(1); } } ?>This example uses the command Include to load configuration from a file.
In case the file does not exists, this example will output a string like this:
Can't open config file '/usr/local/mnogosearch/etc/include.conf': No such file or directory
string errmsg();returns the description of the last error, or an empty string if no errors have happened so far.
Example
<?mnogosearch { ENV env; if (env.addline("DBAdr mysql://root@localhost/test/")) { cout << env.errmsg() << "\n"; exit(1); } } ?>This example will output:
Unknown command: DBAdr(notice there is a typo in the command, the correct command is DBAddr.
The RESULT represent a result of a search query.
RESULT methods
int find(ENV &env, string query);executes a new search against query using a previously configured ENV env, and stores the query results in the current RESULT variable.
The string query must be formed according to the HTTP query string notation. Individual parameter meaning are described in the Section called Search parameters in Chapter 11.
Returns 0 on success, or a non-zero value on error.
Note: In case an error, the error description is available by calling the method env.errmsg().
Example
<?mnogosearch { ENV env; RESULT result; if (env.addline("DBAddr mysql://root@localhost/test/") || result.find(env, "q=test&ps=20")) { cout << env.errmsg() << "\n"; exit(1); } cout << result.total_found() << " documents were found\n"; cout << "Displaying documents " << result.first() + 1 << "-" << result.last() + 1 << "\n"; } ?>This example will output:
903 documents were found Displaying documents 1-20
The above example passes test as a query text, and requests the search engine to return 20 documents per search result page.
int total_found() const;returns the total number of documents that matched the search query.
int first() const;returns the rank of the first document on the current search result page. The rank of a document varies between 0 (the first document) and (total_found()-1) (the last document).
Note: mnoGoSearch returns returns documents in pages, 10 documents per page by default. Pages are switched when the user clicks the navigator bar:
Result pages: Previous 1 [2] 3 4 5 6 7 8 9 10 NextThe first document on the second page has rank of 10, the first document on the third page has rank of 20 (assuming the default page size), etc.
int last() const;returns the rank of the last document on the current search result page.
int num_rows() const;returns the number of documents on the current result page. It's equal to the value of the ps search query parameter, except the last search result page which can have less than ps documents.
int num_uniq_words() const;returns the number of unique words (search terms) that were found by the query parser in the original search query typed by the user.
int num_words() const;returns the number of words (search terms) that were found in the query, including word forms generated by fuzzy algorithms such as stemming, synonyms, etc. Each word typed by the user can produce zero or more additional generated word forms.
In case if there are no any fuzzy search algorithms enabled, num_uniq_words() and num_word() return the same values.
string property(string name) const;returns a search result property by name.
The following properties are understood:
property("qid")
Returns the query cache ID of the current query if the query was cached, or an empty string otherwise.
property("StrictModeFound")
Returns the number of search results that were found in a stricter mode if search.cgi automatically switched to a less strict mode, or an empty string otherwise.
property("WS")
Returns a suggested query, or an empty string.
property("SearchTime")
Returns a string representing time (in microseconds format) spent to generate search results.
Example
Search spent <b> <?mnogosearch { double st= res.property("SearchTime").stod() / 1000; string x= to_string(st); int pos= x.find("."); if (pos < x.length()) { pos+= 4; // Preserve the dot and 3 fractional digits x= x.substr(0, pos); } cout << x; } ?> </b> seconds to generate search result.This example will display search time in seconds with 3 fractional digits.
string document_property(int n, string name) const;returns a property of the n-th document of the current result by name, in plain text format.
n must be between 0 and
num_rows
()-1
The returned value is converted to BrowserCharset, the characters that cannot be converted are replaced to question marks.
See the description of the
document_propety_html()
method
for the list of possible name
values.
string document_property_html(int, string name) const; string document_property_html(int, string name, string hlbeg, string hlend) const;returns a property of the n-th document of the current result by name, in HTML format.
n must be between 0 and
num_rows
()-1
The returned value is converted to BrowserCharset, the characters that cannot be converted are replaced to HTML numeric entities, e.g. А.
If the hlbeg
and hlend
parameters are passed, then the words from the search query that
generated the current result are highlighted with the given values.
The following document properties are understood:
url
The URL of the document.
alias
The URL of the document, with Alias settings applied.
order
The rank of the document in the current result set.
The value varies between first()
and
last()
values of the current result.
title
The title of the document.
msg.subject
The Subject header of the document, if the document is of type message/rfc822, and an empty string otherwise.
score
The score of the document.
pop_rank
The popularity of the document, calculated taking into account the number of incoming and outgoing links.
body
An excerpt from the body of the document, extracted according to the ExcerptSize and ExcerptPadding settings.
stored_href
A link to the cached copy of the document, or an empty string if a cached copy is not available.
meta.name
The content of the name meta tag of the document, where name can be any meta tag, e.g. keywords or description.
Content-Type
The content type of the document.
Content-Length
The size of the document, in bytes.
Last-Modified-Timestamp
The timestamp telling when the document was last modified, in seconds since the Epoch.
Last-Modified
The value of Last-Modified-Timestamp formatted according to DateFormat.
PerSite
The number of documents that were found on the same site. It's available only if GroupBySite is enabled.
UniqueWordHitVector
A string consisting of digits 1 and 0
(e.g. 01010), with length equal to
num_uniq_words()
.
1 on the i-th rightmost position means that the i-th word was found in this documents, and 0 means that this word was not found.
SectionHitVector
A string consisting of digits 1 and 0 (e.g. 001010), with length equal to NumSections, describing query word distribution between sections of the document.
1 on the i-th rightmost position means that some of the query words were found in the i-th section, and 0 means that no words were found in this section.
For example, with the default configuration, 0..0000001 means that all query words were found in the body of the document and no words were found in the other document sections (such as title or meta tags).
QUERYWORD word(int n) const;returns the statistics of the n-th query word, as a QUERYWORD instance.
The QUERYWORD class is designed to present query words (search terms) found in the search query typed by the used, as well as their statistics in a search result set.
QUERYWORD methods
string word();returns the word as string.
int count();returns the number of hits of the word (the total number of time the word was found in the database). Note, a word can be found multiple times in each document.
int doccount();returns the number of documents the word was found in.
int order();returns the ID of the word, between 0 and res.num_uniq_words()-1, where
res
is the result set that
produced the current query word.
All forms of the same word generated by fuzzy algorithms (e.g. synonyms or stemming) have the same ID.
int origin();returns the origin of the word.
The following word origin values are possible:
1 - a normal word from the original user query that was used as a search term.
2 - a word form that was generated by the stemming fuzzy search algorithm.
3 and 4 - a word form that was generated by the synonyms fuzzy search algorithm.
5 - a stopword. The word was typed by the user, but was not used as a search term.
6 - a suggestion instead of a possibly mis-typed query word which did not produce any hits.
7 - an accent insensitive form of an original query word.
int weight();returns the weight (importance) of the word.
DOCUMENT is used to display cached copies, as well as to include external documents.
ENV methods
int cached(const ENV &env, string query_string);Extracts a cached copy of a document from the database, by the URL of the document.
query_string
must be formatted according to
the HTTP query string and contain the URL= parameter:
URL=http:%2F%2Fsite.com%2Fpath%2Fpage.html
Returns 0 on success, or a non-zero code on errors.
int download(const ENV &env, string url);Downloads a document from the given location.
Returns 0 on success, or a non-zero code on errors.
download
can be used to include external documents.
See the section about the method content
for an example.
string content();Returns the content of the document.
Example
<?mnogosearch { ENV env; DOCUMENT inc; inc.download(env, "http://www.mnogosearch.org"); cout << inc.content(); } ?>This example downloads an external document from the given location and prints it.
string content_highlight(string hlbeg, string hlend);Returns the content of the document with the query words highlighted.
This method is used to display a cached copy of the document.
The cached copy must previously be loaded using
the method cached
.
string property(string name);returns a property of the document by the property name, in plain text format.
Properties are used to display a header of a cached copy of a document
and are available after a call for the method cached
.
For a list of known propery names see the description of
RESULT::document_property_html
.
WARNING: Since the template file contains secure information such as database password, it is recommended to give proper permissions to the template file to protect it from reading by anyone but you and the search program. Otherwise your passwords may leak.