Zebra has been deployed in numerous applications, in both the academic and commercial worlds, in application domains as diverse as bibliographic catalogues, Geo-spatial information, structured vocabulary browsing, government information locators, civic information systems, environmental observations, museum information and web indexes.
Notable applications include the following:
Koha is a full-featured open-source ILS, initially developed in New Zealand by Katipo Communications Ltd, and first deployed in January of 2000 for Horowhenua Library Trust. It is currently maintained by a team of software providers and library technology staff from around the globe.
LibLime, a company that is marketing and supporting Koha, adds in the new release of Koha 3.0 the Zebra database server to drive its bibliographic database.
In early 2005, the Koha project development team began looking at ways to improve MARC support and overcome scalability limitations in the Koha 2.x series. After extensive evaluations of the best of the Open Source textual database engines - including MySQL full-text searching, PostgreSQL, Lucene and Plucene - the team selected Zebra.
"Zebra completely eliminates scalability limitations, because it can support tens of millions of records." explained Joshua Ferraro, LibLime's Technology President and Koha's Project Release Manager. "Our performance tests showed search results in under a second for databases with over 5 million records on a modest i386 900Mhz test server."
"Zebra also includes support for true boolean search expressions and relevance-ranked free-text queries, both of which the Koha 2.x series lack. Zebra also supports incremental and safe database updates, which allow on-the-fly record management. Finally, since Zebra has at its heart the Z39.50 protocol, it greatly improves Koha's support for that critical library standard."
Although the bibliographic database will be moved to Zebra, Koha 3.0 will continue to use a relational SQL-based database design for the 'factual' database. "Relational database managers have their strengths, in spite of their inability to handle large numbers of bibliographic records efficiently," summed up Ferraro, "We're taking the best from both worlds in our redesigned Koha 3.0.
See also LibLime's newsletter article Koha Earns its Stripes.
Kete is a digital object management repository, initially developed in New Zealand. Initial development has been a partnership between the Horowhenua Library Trust and Katipo Communications Ltd. funded as part of the Community Partnership Fund in 2006. Kete is purpose built software to enable communities to build their own digital libraries, archives and repositories.
It is based on Ruby-on-Rails and MySQL, and integrates the Zebra server and the YAZ toolkit for indexing and retrieval of it's content. Zebra is run as separate computer process from the Kete application. See how Kete manages Zebra.
Why does Kete wants to use Zebra?? Speed, Scalability and easy integration with Koha. Read their detailed reasoning here.
Emilda is a complete Integrated Library System, released under the GNU General Public License. It has a full featured Web-OPAC, allowing comprehensive system management from virtually any computer with an Internet connection, has template based layout allowing anyone to alter the visual appearance of Emilda, and is XML based language for fast and easy portability to virtually any language. Currently, Emilda is used at three schools in Espoo, Finland.
As a surplus, 100% MARC compatibility has been achieved using the Zebra Server from Index Data as backend server.
Reindex.net is a netbased library service offering all traditional functions on a very high level plus many new services. Reindex.net is a comprehensive and powerful WEB system based on standards such as XML and Z39.50. updates. Reindex supports MARC21, danMARC eller Dublin Core with UTF8-encoding.
Reindex.net runs on GNU/Debian Linux with Zebra and Simpleserver from Index Data for bibliographic data. The relational database system Sybase 9 XML is used for administrative data. Internally MARCXML is used for bibliographical records. Update utilizes Z39.50 extended services.
DADS is a huge database of more than ten million records, totalling over ten gigabytes of data. The records are metadata about academic journal articles, primarily scientific; about 10% of these metadata records link to the full text of the articles they describe, a body of about a terabyte of information (although the full text is not indexed.)
It allows students and researchers at DTU (Danmarks Tekniske Universitet, the Technical College of Denmark) to find and order articles from multiple databases in a single query. The database contains literature on all engineering subjects. It's available on-line through a web gateway, though currently only to registered users.
More information can be found at http://www.dtv.dk/ and http://dads.dtv.dk
The InfoNet Eprints service from the Technical Knowledge Center of Denmark provides access to documents stored in eprint/preprint servers and institutional research archives around the world. The service is based on Open Archives Initiative metadata harvesting of selected scientific archives around the world. These open archives offer free and unrestricted access to their contents.
Infonet Eprints currently holds 1.4 million records from 16 archives. The online search facility is found at http://preprints.cvt.dk.
The Alvis EU project run under the 6th Framework (IST-1-002068-STP) is building a semantic-based peer-to-peer search engine. A consortium of eleven partners from six different European Community countries plus Switzerland and China contribute with expertise in a broad range of specialties including network topologies, routing algorithms, linguistic analysis and bioinformatics.
The Zebra information retrieval indexing machine is used inside the Alvis framework to manage huge collections of natural language processed and enhanced XML data, coming from a topic relevant web crawl. In this application, Zebra swallows and manages 37GB of XML data in about 4 hours, resulting in search times of fractions of seconds.
The M25 Systems Team has created a union catalogue for the periodicals of the twenty-one constituent libraries of the University of London and the University of Westminster (http://www.m25lib.ac.uk/ULS/). They have achieved this using an unusual architecture, which they describe as a ``non-distributed virtual union catalogue''.
The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary holdings. Then 21 individual Z39.50 targets are created, each using Zebra, and all mounted on the single hardware server. The live service provides a web gateway allowing Z39.50 searching of all of the targets or a selection of them. Zebra's small footprint allows a relatively modest system to comfortably host the 21 servers.
More information can be found at http://www.m25lib.ac.uk/ULS/
Fernuniversität Hagen in Germany have developed a natural language interface for access to library databases. In order to evaluate this interface for recall and precision, they chose Zebra as the basis for retrieval effectiveness. The Zebra server contains a copy of the GIRT database, consisting of more than 76000 records in SGML format (bibliographic records from social science), which are mapped to MARC for presentation.
(GIRT is the German Indexing and Retrieval Testdatabase. It is a standard German-language test database for intelligent indexing and retrieval systems. See http://www.gesis.org/forschung/informationstechnologie/clef-delos.htm)
Evaluation will take place as part of the TREC/CLEF campaign 2003 http://clef.iei.pi.cnr.it.
For more information, contact Johannes Leveling
<Johannes.Leveling@FernUni-Hagen.De>
Zebra has been used by a variety of institutions to construct indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly to the engine of Google or AltaVista, but for a selected intranet or a subset of the whole Web.
For example, Liverpool University's web-search facility (see on the home page at http://www.liv.ac.uk/ and many sub-pages) works by relevance-searching a Zebra database which is populated by the Harvest-NG web-crawling software.
For more information on Liverpool university's intranet search
architecture, contact John Gilbertson
<jgilbert@liverpool.ac.uk>
Kang-Jin Lee has recently modified the Harvest web indexer to use Zebra as its native repository engine. His comments on the switch over from the old engine are revealing:
The first results after some testing with Zebra are very promising. The tests were done with around 220,000 SOIF files, which occupies 1.6GB of disk space.
Building the index from scratch takes around one hour with Zebra where [old-engine] needs around five hours. While [old-engine] blocks search requests when updating its index, Zebra can still answer search requests. [...] Zebra supports incremental indexing which will speed up indexing even further.
While the search time of [old-engine] varies from some seconds to some minutes depending how expensive the query is, Zebra usually takes around one to three seconds, even for expensive queries. [...] Zebra can search more than 100 times faster than [old-engine] and can process multiple search requests simultaneously
I am very happy to see such nice software available under GPL.