This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.
Search Engine News
Dariusz Kogut IT/WO
The Search Engine used at CERN to perform search across all CERN Web servers
is UltraSeek
provided by the company Infoseek (see the article
"CERN Search Syntax Summary"
in this CNL).
Information on searches on a sub-set of the CERN Intranet
both from the reader's and authors' point of view is
provided in the corresponding WebOffice Web page
at URL:
http://www.cern.ch/WebOffice/Doc/Search/
.
Since December 7th, CERN is running UltraSeek server 3.02.
This article is listing the Release Notes (new features and
implementation changes).
New features:
-
Date range searches.
Users can specify a range of document dates in their searches.
-
New languages support.
Lexical support has been added for Swedish, Danish, Finnish and Norwegian.
UltraSeek supported only English, Dutch, French, German, Italian,
Portuguese
and Spanish in previous version.
-
Cooperative spidering.
There is now support enabling cooperation between multiple spidering
instances
of UltraSeek (they are able to feed hyperlink URLs to
each other).
-
Merge collection.
A new kind of collection can be built by merging the indices from other
collections.
This is useful if you have many collections and you want to produce
a single set
of indices for search.
-
XML support.
Documents written in the Extensible Markup Language (XML) can
be indexed.
-
SSL support.
The spider supports the Secure Sockets Layer and can fetch documents
using HTTPS.
Available as an option.
-
HTML meta tag names.
This feature allows to specify exactly which "meta" tags to
use for the purposes of title
override, summary override, date override, keywords and publisher.
-
Word spam detection thresholds.
Ultraseek indexer includes an automatic spam detection algorithm. It
decides that
a document is spamming a certain word if that word occurs more often
than a certain
number of times within a sequence of 100 words.
-
Disallow rext or links from apparent directory
listing documents.
You can disable the indexing of text or the following of links from
documents that are
directory listings.
-
Document type parsing.
You can customize the names of document content-types.
-
Add URL collection determination.
UltraSeek Server can add the URL to all collections that
allow it, and can also redirect
the request to another instance of UltraSeek Server if cooperative
spidering is configured.
-
URL status.
This feature can automatically determine which collection a URL
belongs to.
-
Revisit site.
You can request that all URLs from specific site should be revisited.
-
Site and URL listings.
You can get a listing of all known URLs on specified site. Site
and URL listings are now
available to non-administrative users.
Implementation changes:
-
Filter Lotus Domino navigation links
Filter Lotus Domino Navigation Links feature has been improved.
-
Add URL speed of indexing.
URLs added through the "Add URL" interface, during times that the spider
is busy, get indexed
more quickly than did in previous version.
-
Use HTTP Keep Alives.
A bug in the spider that would cause it to timeout when communicating
with some HTTP servers
with keep-alives turned on has been fixed.
You can visit a UltraSeek Server Patches
Release Notes Web page as well.
For comments send e-mail to www.support@cern.ch
For matters related to this article please contact the author.
Cnl.Editor@cern.ch
Last Updated on December 15th, 1998 at 10:12:55
Copyright © CERN 1998 -- European Laboratory for Particle Physics