But it does exist:
in pool:
root@gridstore:/projects/SE/data/pool/data> ls -l 00010000000000000031F0C8
-rw-r----- 1 atlas atlas 259509386 Dec 20 05:29
00010000000000000031F0C8
in pnfs:
-rw-r--r-- 1 atlas atlas 259509386 Dec 20 05:29
mc11.004202.ZmumuJimmy.digit.RDO.v11000301._00019.pool.root.1
Thank you for your help !
Sergey
On Thursday 22 December 2005 15:47, Sergey Chechelnitskiy wrote:
> Hi,
>
> Thanks.
> Do you mean that this file in pool is not available ?
> Pool itself is available - we have only one pool and other files are doing
> well.
> For this file I found only
>
> root@gridstore:/projects/SE/d-cache/log> grep 00010000000000000031F0C8
> gridstoreDomain.log
> 12/20 05:29:42 Cell(gridstore_1@gridstoreDomain) : getChecksumFromPnfs : No
> crc available for 00010000000000000031F0C8
>
> What could it tell me ?
> Sergey
>
> On Thursday 22 December 2005 15:43, bakken wrote:
> > go to the pool and look in the log and it should
> > tell you why it suspended itself.
> >
> > retry's don't help on suspended pools, because the
> > pools isn't available - that is why the transfer
> > is suspended
> >
> > -- Jon There's no place like 127.0.0.1
> > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> > Jon A. Bakken bakken@fnal.gov (630) 840-4790
> >
> > On Thu, 22 Dec 2005, Sergey Chechelnitskiy wrote:
> > > Hi,
> > >
> > > I found one file which is suspended:
> > >
> > > (PoolManager) admin > rc ls
> > > 00010000000000000031F0C8@0.0.0.0/0.0.0.0 m=6 r=0 [unknown] [Suspended
> > > (pool unavailable) 12.22 14:09:30] {1010,Suspend}
> > >
> > > retry (rc retry) with the -si-refresh option doesn't help:
> > > (PoolManager) admin > rc retry -si-refresh 00010000000000000031F0C8
> > > dmg.util.CommandThrowableException: (3)
> > > java.lang.IllegalArgumentException: Not found :
> > > 00010000000000000031F0C8 from ac_rc_retry_$_1
> > >
> > > What else can I do ?
> > > Sergey
> > >
> > > On Thursday 22 December 2005 14:51, Sergey Chechelnitskiy wrote:
> > >> Hi,
> > >>
> > >> Thank you. Problem still presents. But I agree - I see normal behavior
> > >> with the timestamps. We have no firewall or NAT issues. Loglevel is
> > >> set to 3 for all domains.
> > >> I still cannot read file from pnfs. Srm, gridftp and dcap just hang
> > >> without any error. The correspondent pool file doesn't present in
> > >> suspended files (pool unavailable). Only pnfsDomain.log and
> > >> dCacheDomain.log are updated and no errors appeared in both.
> > >> dCacheDomain.log for "wrong" file stops at
> > >>
> > >> 12/22 14:31:11 Cell(RoutingMgr@dCacheDomain) : Adding :
> > >> GFTP-wormhole-Unknown-322
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : Adding request for :
> > >> 0001000000000000003DA7F0@0.0.0.0/0.0.0.0
> > >>
> > >> while for good file it continues with
> > >>
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) :
> > >> 0001000000000000003DA7F0
> > >>
> > >> : Adding Object : null
> > >>
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) :
> > >> 0001000000000000003DA7F0
> > >>
> > >> : Starting Engine
> > >>
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) :
> > >> 0001000000000000003DA7F0
> > >>
> > >> : ACTIVATING STATE ENGINE 0001000000000000003DA7F0 0
> > >>
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) :
> > >> 0001000000000000003DA7F0
> > >>
> > >> : StageEngine called in mode Init with object (NULL)
> > >>
> > >> 12/22 14:31:12 Cell(PoolManager@dCacheDomain) : PFL
> > >> [0001000000000000003DA7F0] : calculateFileAvailableMatrix
> > >> _expectedFromPnfs
> > >>
> > >> : [gridstore_1]
> > >>
> > >> and so on.
> > >>
> > >> For "bad" file it just stops - no more entries.
> > >> I re-uploaded this file into its origin place in pnfs and performed
> > >> tests - now ok. maxLogin for gridftp and get-req-thread-pool-size in
> > >> srm are set to 300 and get-lifetime=3600000 for srm
> > >> (we had in logfile Connection denied 101 > 100 )
> > >>
> > >> If we'll have new files that cannot be read I will present all
> > >> logfiles. Regards,
> > >> Sergey
> > >>
> > >> On Wednesday 21 December 2005 20:27, bakken wrote:
> > >>>> then the appropriate file in pool is also deleted. So, pnfs knows
> > >>>> how to match the pnfs-file to pool-file. But we cannot read it by
> > >>>> dcache means.
> > >>>
> > >>> ssh to the PoolManager cell and do 'rc ls' - see if your file's
> > >>> pnfsid is listed and what its state is -- if so, do a retry (rc
> > >>> retry) with the -si-refresh option. Similar information can be
> > >>> gleaned from the lazy restore web page, but you have to be in the
> > >>> pool manager cell to issue the retry.
> > >>>
> > >>>> Questions:
> > >>>> 1.where should we look for which command affected this file
> > >>>
> > >>> I'd start with the log files for the dcache domain where the
> > >>> poolmanager cell is and the pool cell, and see what it says. Grep
> > >>> for the pnfsid and see if that gives you a clue of where to look.
> > >
> > > --
> > > --
> > > Best Regards,
> > > Sergey Chechelnitskiy (chech@sfu.ca)
> > > WestGrid/SFU
-- -- Best Regards, Sergey Chechelnitskiy (chech@sfu.ca) WestGrid/SFUReceived on Fri Dec 23 00:48:39 2005