BNL configuration notes and deployment of dCache 1.8 patch level 13
The BNL's dCache 1.8 second instance dcsrmv2.usatlas.bnl.gov is upgraded to Patch 13. The following is the infrastructure of the system.
- Admin node (dcache02.usatlas.bnl.gov)
- Memory: 8G
- Available disk: 2TB
- Services: It hosts dCache admin components like PnfsManager, PoolManager and administrative interface etc. In the test bed, we did not separate PNFS from dcache admin node. So this machine host PNFS. For convince, we also installed dcap door in this node.
- SRM door (dcsrmv2.usatlas.bnl.gov)
- Memory: 8G
- Available disk: 2TB
- Services: SRM2.2, Utility and SRM database
- GridFtp Door (dcdoor99.usatlas.bnl)
- Memory: 5G
- Available disk: 30G
- The machine is out side of BNL firewall and have two NICs
- Services: Grid ftp door with two interfaces
- Read/Write pools (dc002.usatlas.bnl.gov)
- Thumper with SunOS 5.10
- Memory: 16288 Megabytes
- Available disks: 16 TB
- Services: Four dCache pools (read/write).
The system has HPSS as backend tape storatge system. The
LinkGroup is considered Custodial. The
PoolManager.conf file is attached.
- PoolManager.conf: A file describe the link, linkgroup and pool relationship in dCache1.8
-- Yingzi (Iris) Wu - 24 Aug 2007
BNL configuration notes and deployment of dCache 1.8 patch level 1
This information is presented in two parts. The first part shows the current configuration of the BNL test point used on Flavia's tests. In the second part the experience with previous dCache 1.8 patch level 1 installation is presented.
Current BNL's dCache1.8 patch level 1 configuration
This is a stand alone dCache 1.8 installation all the dCache1.8 components run in the same server.
The storage class configured is :
REPLICA-ONLINE (
Tape0Disk1)
The BNL's dCache 1.8 patch level 1 is installed on one server with the following specifications:
- CPU speed 3400 MHz.
- mem_total 4149240 KB.
- Linux release 2.6.9-42.0.8.ELsmp.
- This is a server located outside BNL's firewall not tape storage configured.
- Disk space for storage 60GB.
General notes of dCache1.8 patch level 1 installation and configuration
I used the information provided from the dCache website, Timur's website and notes from previous stand alone dcache installation (1.7).
Attached to this page are the main configuration files I used to deploy the dCache1.8 patch level 1.
For dCache installation and configuration:
- node_config
- pool_path
- dCacheSetup
- dcachesrm-gplazma.policy
- PoolManager.conf
- srm.batch
- utility.batch
- Pool setup file (setup) same for all pools
Components of the dCache1.8 installation:
As it can be seen from the different configuration files I used,
- 3 write pool of 20GB each one
- gplazma is turn on
- 1 gridftp door
- 1 dcap
- 1 GSIDCAP door
- Pnfs
- Admin cell
Configuration files customized for this installation:
Besides the parameter that needed to be ajusted to install dCache on one server and with the features mentioned before, the following parameters were changed:
dCacheSetup
- srmCopyReqThreadPoolSize=12
- remoteGsiftpMaxTransfers=12 (this is assuming that 4 gridftp transfer per pool and 3 pools so 4*3= 12)
PoolManager.conf
Using the admin shell on the SRM cell I changed this two parameters:
(SRM-dct00) admin > set max ready get 20
(SRM-dct00) admin > set max ready put 20
References for installation
-General instructions followed from dcache BOOK and Timur's page.
-SRM configuration
http://home.fnal.gov/~timur/dCacheBook/cf-srm.html
http://www.dcache.org/manuals/Book/cf-srm.shtml
http://www.dcache.org/
Experience with dCache 1.8
Before the BNL test point passed the Flavia's tests, the following is a summary of the installation performance:
I could observe that it stayed on three states:
- State 1: After a fresh reboot the system reached its best performance. Here the tests that failed were the following:
- ReleasedFiles
- Mv This test returned returnStatus=SRM_REQUEST_QUEUED
- State 2: After a clear start and the system working for more than 10 hours the number of test that returned SRM_REQUEST_QUEUED increased:
- 06_StatusOfBringOnlineRequest
- 09_ReleaseFiles
- 02_StatusOfPutRequest
- 04_PutDone
- 05_PrepareToGet
- 05_StatusOfGetRequest
- State 3: The test reported globus-url-copy failed 137.
- 05_StatusOfGetRequest
- 06_BringOnline
- 06_StatusOfBringOnlineRequest
- 09_ReleaseFiles
Tracing a specific test when reported SRM_REQUEST_QUEUED
Looking into different log files on dcache such as catalina.out, and traicing the file used to perform this test, I found the following information:
5_StatusOfGetRequest: Executing srmPrepareToPut, putRequestToken=-2147472416
05_StatusOfGetRequest: fileRequests.expectedFileSize[{2691 }]
05_StatusOfGetRequest: desiredFileStorageType=PERMANENT
05_StatusOfGetRequest: srmPrepareToPut, returnStatus=SRM_REQUEST_QUEUED
05_StatusOfGetRequest: srmStatusOfPutRequest, returnStatus=SRM_SUCCESS
05_StatusOfGetRequest: srmStatusOfPutRequest, remainingTotalRequestTime=
05_StatusOfGetRequest: srmPutDone, fileStatuses=surl0=srm://dct00.usatlas.bnl.gov:8443/srm/managerv2?SFN=//pnfs/usatlas.bnl.gov/data/dteam/20070524-210113-28241-0.txt returnStatus.explanation0=Done returnStatus.statusCode0=SRM_SUCCESS
05_StatusOfGetRequest: Put cycle succeeded
05_StatusOfGetRequest: srmPrepareToGet, getRequestToken=-2147472414
By looking at the pnfs id asinged to this file 000100000000000000095F48 (/pnfs/usatlas.bnl.gov/data/dteam/20070524-210113-28241-0.txt) on the srm log,
it should be possible to locate the file on a particular pool:
(PnfsManager) admin > cacheinfoof 000100000000000000095F48
cacheinfoof 000100000000000000095F48
No pool was returned
However, the file does exist on the pool: [root@dct00 data]# pwd
/data/data5/dcache_pool_5/pool/data
[root@dct00 data]# ls -l 000100000000000000095F48
-rw-r--r-- 1 root root 2691 May 24 15:01 000100000000000000095F48
Then looking in the admin on the pool
(dct00_5) admin > rep ls -l 000100000000000000095F48
rep ls -l 000100000000000000095F48
000100000000000000095F48 <C-------X--(0)[0]> 2691 si={myStore:STRING}
Reinstallation of the dCache1.8 patch level 1
In order to have a clear and fresh installation of the different dcache components I decided to reinstall dcache1.8 patch level 1 databases, pnfs. Nevertheless, I kept previous dcache configuration files and used them to configure the new installation changing the following:
Changes on the configuration for the new deployment of dCache 1.8
- Reduce write pools from 5 to 3 units.
- Increased the timeout pool from 120 to 240.
- srmCopyReqThreadPoolSize to 12 (asuming 4 gridftp transfer per pool and 3 pools, so 4*3= 12); before this parameter was 25 assuming 5 gridftp transfer per pool and with 5 pools.
- remoteGsiftpMaxTransfers=12; before 25
- maxReadyJobs=20; before 25.
- 30 mover queue / per pool. The previous installation had 5 pools I used 18 per pool.
By applying this changes the test point passed Flavias tests. It seems to me the problem consisted on tunning up the system according of the test requests to avoid consecutive requests staying on queue.
-- Main.cgamboa - 21 Jun 2007