(r4) SquidMonitoringTFInfoSystem < LCG

LCG Web>WLCGCommonComputingReadinessChallenges>WLCGOperationsWeb>WLCGOpsCoordination>SquidMonitoringTaskForce>SquidMonitoringTFInfoSystem (revision 4)~~EditAttachPDF~~

This is a proposal for how to store & maintain squid configuration information for the WLCG.

Terms:

Squid machine: a computer that runs a squid process
Squid service: a squid machine or set of squid machines that perform a specific function or functions. Sites may have multiple squid services or a single one. Sites that accept opportunistic grid jobs are encouraged to have a squid service for opportunistic use that is separate from the production service.
Squid proxy: a squid service used as an http proxy as opposed to a reverse proxy.
Squid monitoring servers: the pair of machines implementing wlcg-squid-monitor.cern.ch

Squid configuration information will take 3 forms:

Information System View: a list of squid services at all sites as public internet DNS names. If there is only one squid machine in a squid service, the DNS name can be the primary name or an alias for the machine, but if there are multiple squid machines in the service, the name must be a round-robin alias listing all of the IP addresses of the squid machines (that is, a single address of a hardware load balancer is not allowed). Does not include port numbers.
Monitoring View: a list of individual public DNS names for each squid, with monitoring port numbers. The default port is 3401.
Worker Node View: list of squid proxies to use on worker nodes, with proxy port numbers. The default port is 3128. May be on a private network. If there are multiple squids, they may be in a round-robin DNS alias or a hardware load balancer. May include backup proxies at other sites, to be tried if all previous proxies have failed. Figuring out how to store this is outside of the scope of this task force, but it is defined here in order to show how it is distinct from the other 2 forms and so the design of the storage for the other two forms does not interfere with this view.

For example: CMS site T2_ES_IFCA has 2 squids on a private network known internally as squid01prv.ifca.es and squid02prv.ifca.es and externally as squid01.ifca.es and squid02.ifca.es. So far they have no round-robin names for these, but they would be required to create a private one and a public one, for example squidprv.efca.es and squid.efca.es. Then the Information System View would be "squid.efca.es". The Monitoring view would be squid01.ifca.es:3401 and squid02.ifca.es:3401, and the Worker Node View would be squidprv.efca.es:3128.

Details:

Store the Information System View in GOCDB and OIM. Maintenance of the information is the responsibility of site administrators.
Create the Monitoring View from the Information System View by means of translation files on the squid monitoring servers. The files will be one per VO for each of the two Views, but have the same simple format. Maintenance of the translation files will be the responsibility of operations personnel from each VO. The information system data (that is, site names, squid service names, and VO names) will be read either directly from GOCDB & OIM or via ATP. The CMS VO also will make use of a translation of site names from the Information System View into the CMS site names, either from ATP or directly from the CMS SiteDB. A simple site with only one squid on a public network using the default ports will need no translation entries, but sites with multiple squids, a private worker node network, or non-standard ports will need entries. The translation files will also be able to add whole sites that aren't in the Information System View, but that will be discouraged except for reverse proxies like on Frontier launchpads and CVMFS stratum ones.
Whenever Squid-related information is duplicated in more than one source, audits will regularly compare them, and notices will be sent to operations personnel when they don't match. The information can also be stored in different forms (e.g. in AGIS & ATP) but it should come from the above primary sources.
The Monitoring View is only needed on the squid monitoring server so it doesn't need to be made available publicly.

Rationale:

The responsibilities are very similar to things that are already being done. Storing squid information in GOCDB & OIM is new, but it is very much like other things that site administrators already do and no new functionality is asked of these information systems, just a new field. CMS operations people already maintain a translation file very similar to this (to translate between the Worker Node View and the Monitoring View) on the existing squid monitoring server, and ATLAS operations maintains a python script there that does effectively the same thing.
The solution is as simple as possible given the requirements.

Defining how the Worker Node View is stored is outside of the scope of this task force, but in order to have a design for storing the Worker that allows for it in the future, here is a possible way that may be handled:

Create the Worker Node View in a way analogous to the Monitoring View, with translation files on the squid monitoring servers.
Generate internet-standard Web Proxy Auto Discovery (WPAD) files from the Worker Node View for each site, and supply a web service at http://wlcg-wpad.cern.ch/wpad.dat that every WLCG worker node may contact as frequently as once per job to find out what proxies to use. The correct wpad.dat will be returned depending on the source IP address. Initially this service can be an alias for the Squid monitoring servers but later depending on performance we can move them to different servers. The wpad.dat files will also be made available on wlcg-wpad.cern.ch to be looked up by site name or IP address if someone wants to find out when they are not running at a site. Individual sites may configure instead their own service at http://wpad/wpad.dat that will take precedence, and large sites will be encouraged to do it for performance reasons (even if only as a reverse proxy of the centrally-generated file).
The Worker Node View will only be made available to the public via the wpad.dat files, although it may be translated elsewhere to other forms (e.g. ATLAS may some day want to generate $FRONTIER_SERVER from it combined with frontier server and backup proxy information from AGIS). A tool will be provided to look up a list of proxies given a source address and destination URL. It shouldn't be necessary to put the Worker Node View into AGIS or ATP.

Topic revision: r4 - 2013-01-24 - DaveDykstra

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback