(r2) SquidMonitoringTFInfoSystem < LCG

LCG Web>WLCGCommonComputingReadinessChallenges>WLCGOperationsWeb>WLCGOpsCoordination>SquidMonitoringTaskForce>SquidMonitoringTFInfoSystem (revision 2)~~EditAttachPDF~~

This is a proposal for how to store & maintain squid configuration information for the WLCG.

Terms:

Squid machine: a computer that runs a squid process
Squid service: a squid machine or set of squid machines that perform a specific function or functions. Sites may have multiple squid services or a single one. Sites that accept opportunistic grid jobs are encouraged to have a squid service for opportunistic use that is separate from the production service.
Squid proxy: a squid service used as an http proxy as opposed to a reverse proxy.
Squid monitoring servers: the pair of machines implementing wlcg-squid-monitor.cern.ch

Squid configuration information will take 3 forms:

Information System View: a list of squid services at all sites as public internet DNS names. If there is only one squid machine in a squid service, the DNS name can be the primary name or an alias for the machine, but if there are multiple squid machines in the service, the name must be a round-robin alias listing all of the IP addresses of the squid machines (that is, a single address of a hardware load balancer is not allowed). Does not include port numbers.
Monitoring View: a list of individual public DNS names for each squid, with monitoring port numbers. The default port is 3401.
Worker Node View: list of squid proxies to use on worker nodes, with proxy port numbers. The default port is 3128. May be on a private network. If there are multiple squids, they may be in a round-robin DNS alias or a hardware load balancer. May include backup proxies at other sites, to be tried if all previous proxies have failed.

Details:

Store the Information System View in GOCDB and OIM. Maintenance of the information is the responsibility of site administrators.
Create the Monitoring View and Worker Node View from the Information System View by means of translation files on the squid monitoring servers. The files will be one per VO for each of the two Views, but have the same simple format. Maintenance of the translation files will be the responsibility of operations personnel from each VO. The information system data (that is, site names, squid service names, and VO names) will be read either directly from GOCDB & OIM or via ATP. The CMS VO also will make use of a translation of site names from the Information System View into the CMS site names, either from ATP or directly from the CMS SiteDB. A simple site with only one squid on a public network using the default ports will need no translation entries, but sites with multiple squids, a private worker node network, or non-standard ports will need entries. The translation files will also be able to add whole sites that aren't in the Information System View, but that will be discouraged.
Generate Web Proxy Auto Discovery files from the Worker Node View for each site, and supply a web service at http://wlcg-wpad.cern.ch/wpad.dat that every WLCG worker node may contact as frequently as once per job to find out what proxies to use. The correct wpad.dat will be returned depending on the source IP address. Initially this service can be an alias for the Squid monitoring servers but later depending on performance we can move them to different servers. The wpad.dat files will also be made available on wlcg-wpad.cern.ch to be looked up by site name or IP address if someone wants to find out when they are not running at a site. Individual sites may configure instead their own service at http://wpad/wpad.dat that will take precedence, and large sites will be encouraged to do it for performance reasons (even if only as a reverse proxy of the centrally-generated file).
Whenever Squid-related information is duplicated in more than one source, audits will regularly compare them, and notices will be sent to operations personnel when they don't match. The information can also be stored in different forms (e.g. in AGIS & ATP) but it should come from the above primary sources.
The Monitoring View is only needed on the squid monitoring server so it doesn't need to be made available publicly. The Worker Node View will only be made available to the public via the wpad.dat files, although it may be translated elsewhere to other forms (e.g. ATLAS will want to generate $FRONTIER_SERVER from it combined with frontier server information from AGIS). A tool will be provided to look up a list of proxies given a source address and destination URL. It shouldn't be necessary to put the Worker Node View into AGIS or ATP.

Rationale:

The responsibilities are very similar to things that are already being done. Storing squid information in GOCDB & OIM is new, but it is very much like other things that site administrators already do and no new functionality is asked of these information systems, just a new field. CMS operations people already maintain a translation file very similar to this (to translate between the Worker Node View and the Monitoring View) on the existing squid monitoring server, and ATLAS operations maintains a python script there that does effectively the same thing. There is one small difference in responsibility in that currently site administrators for CMS are required to put the Worker Node View into a local configuration file, but they will instead be instructed to either ask the CMS operations people to do it (only if it is different from the Information Systems View) or put it into their own http://wpad/wpad.dat. This is something that will change in roughly a year to wait for frontier client installations to catch up, and meanwhile it can be audited by SAM test or on the squid monitoring server (which would compare to the local configuration file CVS copy as is currently the case).
WPAD is the internet standard for discovering proxies, and it makes sense for it to be different than discovering other grid information because using proxies to look up grid information (for example from ATP) is an excellent way to scale such systems, but of course the proxies can't be used until they are discovered in some other way.
The solution is as simple as possible given the requirements.

Topic revision: r2 - 2013-01-09 - DaveDykstra

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback