Latest revision as of 20:35, 9 February 2012

Sphinx

This page is helpful and I used their config and modified it to our needs.
The Sphinx indexer simply runs on a cron, so that part is simple.
As far as front end, we are going to look at packaging the above linked MW extension.
- The extension depends on sphinxapi.php, which is in the libsphinxclient package, at /usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php.
- The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.

Xapian

Doesn't have a crawler built in.
Most stuff is done via Omega, Xapian just backs it.
Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
htdig is unsupported and OLD.
htdig seems to segfault on https sites in my testing.
Omega's default UI is ugly but that is changeable.

Mnogosearch

Link
Looks nice. Has a somewhat nice UI, and is customizable.
Built in crawler, with a default 1000 line (with comments) config file.
CGI barfs when there are results: bug 19129 and bug 19141 upstream.
- Being able to view results might be important, in a search engine. :)

Others to try

Apache Lucene (with Apache Nutch to crawl).
- Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
Datapark Search
- Fork of Mnogosearch?
- Written in C.

@@ Line 4: / Line 4: @@
 * The Sphinx indexer simply runs on a cron, so that part is simple.
 * As far as front end, we are going to look at packaging the above linked MW extension.
-** The extension depends on sphinxapi.php, which is in the libsphinxclient package, at */usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php*.
+** The extension depends on sphinxapi.php, which is in the libsphinxclient package, at '''/usr/share/doc/libsphinxclient-0.9.9/sphinxapi.php'''.
 ** The extension does not seem to work with MW 1.16, but we want to upgrade eventually anyway.
+* '''Sphinx does not crawl, it only indexes databases, which kind of defeats the purpose for us.'''
+= Xapian =
+* Doesn't have a crawler built in.
+* Most stuff is done via Omega, Xapian just backs it.
+* Hacky way to crawl sites: Crawl with htdig, convert into a format omega understands and can index.
+* htdig is unsupported and '''OLD'''.
+* htdig seems to segfault on https sites in my testing.
+* Omega's default UI is '''ugly''' but that is changeable.
+= Mnogosearch =
+* [http://www.mnogosearch.org/ Link]
+* Looks nice. Has a somewhat nice UI, and is customizable.
+* Built in crawler, with a default 1000 line (with comments) config file.
+* CGI barfs when there are results: [http://mnogosearch.org/bugs/index.php?id=19129 bug 19129] and [http://mnogosearch.org/bugs/index.php?id=19141 bug 19141] upstream.
+** Being able to view results might be important, in a search engine. :)
+= Others to try =
+* Apache Lucene (with Apache Nutch to crawl).
+** Heavily relies Java so probably out of the question (Lucene is Java, Nutch is a Tomcat servlet. Nuff said.)
+* [http://www.dataparksearch.org/ Datapark Search]
+** Fork of Mnogosearch?
+** Written in C.
+* ASPseek
+** C++
+** Last copyright year on [http://www.aspseek.org/ their site] is 2003. Is it unmaintained?

Search

User:Codeblock/Search: Difference between revisions

Latest revision as of 20:35, 9 February 2012

Contents

Sphinx

Xapian

Mnogosearch

Others to try