Re: cannot read file from dcache

From: bakken <bakken@fnal.gov>
Date: Fri Dec 23 2005 - 00:43:12 MET

go to the pool and look in the log and it should
tell you why it suspended itself.

retry's don't help on suspended pools, because the
pools isn't available - that is why the transfer
is suspended

-- Jon There's no place like 127.0.0.1
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Jon A. Bakken bakken@fnal.gov (630) 840-4790

On Thu, 22 Dec 2005, Sergey Chechelnitskiy wrote:

> Hi,
>
> I found one file which is suspended:
>
> (PoolManager) admin > rc ls
> 00010000000000000031F0C8@0.0.0.0/0.0.0.0 m=6 r=0 [unknown] [Suspended (pool
> unavailable) 12.22 14:09:30] {1010,Suspend}
>
> retry (rc retry) with the -si-refresh option doesn't help:
> (PoolManager) admin > rc retry -si-refresh 00010000000000000031F0C8
> dmg.util.CommandThrowableException: (3) java.lang.IllegalArgumentException:
> Not found : 00010000000000000031F0C8 from ac_rc_retry_$_1
>
> What else can I do ?
> Sergey
>
> On Thursday 22 December 2005 14:51, Sergey Chechelnitskiy wrote:
>> Hi,
>>
>> Thank you. Problem still presents. But I agree - I see normal behavior with
>> the timestamps. We have no firewall or NAT issues. Loglevel is set to 3 for
>> all domains.
>> I still cannot read file from pnfs. Srm, gridftp and dcap just hang without
>> any error. The correspondent pool file doesn't present in
>> suspended files (pool unavailable). Only pnfsDomain.log and
>> dCacheDomain.log are updated and no errors appeared in both.
>> dCacheDomain.log for "wrong" file stops at
>>
>> 12/22 14:31:11 Cell(RoutingMgr@dCacheDomain) : Adding :
>> GFTP-wormhole-Unknown-322
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : Adding request for :
>> 0001000000000000003DA7F0@0.0.0.0/0.0.0.0
>>
>> while for good file it continues with
>>
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : 0001000000000000003DA7F0 :
>> Adding Object : null
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : 0001000000000000003DA7F0 :
>> Starting Engine
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : 0001000000000000003DA7F0 :
>> ACTIVATING STATE ENGINE 0001000000000000003DA7F0 0
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : 0001000000000000003DA7F0 :
>> StageEngine called in mode Init with object (NULL)
>> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : PFL
>> [0001000000000000003DA7F0] : calculateFileAvailableMatrix _expectedFromPnfs
>> : [gridstore_1]
>> and so on.
>>
>> For "bad" file it just stops - no more entries.
>> I re-uploaded this file into its origin place in pnfs and performed tests -
>> now ok. maxLogin for gridftp and get-req-thread-pool-size in srm are set to
>> 300 and get-lifetime=3600000 for srm
>> (we had in logfile Connection denied 101 > 100 )
>>
>> If we'll have new files that cannot be read I will present all logfiles.
>> Regards,
>> Sergey
>>
>> On Wednesday 21 December 2005 20:27, bakken wrote:
>>>> then the appropriate file in pool is also deleted. So, pnfs knows how
>>>> to match the pnfs-file to pool-file. But we cannot read it by dcache
>>>> means.
>>>
>>> ssh to the PoolManager cell and do 'rc ls' - see if your file's pnfsid is
>>> listed and what its state is -- if so, do a retry (rc retry) with the
>>> -si-refresh option. Similar information can be gleaned from the lazy
>>> restore web page, but you have to be in the pool manager cell to issue
>>> the retry.
>>>
>>>> Questions:
>>>> 1.where should we look for which command affected this file
>>>
>>> I'd start with the log files for the dcache domain where the poolmanager
>>> cell is and the pool cell, and see what it says. Grep for the pnfsid and
>>> see if that gives you a clue of where to look.
>
> --
> --
> Best Regards,
> Sergey Chechelnitskiy (chech@sfu.ca)
> WestGrid/SFU
>
Received on Fri Dec 23 00:42:45 2005