PDA

View Full Version : FAH Errors


zhotfire
24th May 2003, 03:11
Here are some interesting log entries from a crash:

[04:46:26] Completed 285000 out of 500000 steps (57)
[04:54:59] Writing local files
[04:54:59] Completed 290000 out of 500000 steps (58)
[05:03:34] Writing local files
[05:03:34] Completed 295000 out of 500000 steps (59)
[05:12:11] Writing local files
[05:12:11] Completed 300000 out of 500000 steps (60)
BOOLEAN Reinstall "  
 #   BOOLEAN    W i n 3 2 _ C o n d i t i o n   +  CIM_Check C           G P  `  w W( Win32_Condition Locale UUID {43FF3654-DB32-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  h Microsoft.MSI Condition 
 
   string Feature  $  
   string Level    
    uint16    > W i n 3 2 _ E n v i r o n m e n t S p e c i f i c a t i o n   )  CIM_Check C   (  .     V _  o  Win32_EnvironmentSpecification Locale UUID {8518ABC0-DB32-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  w Microsoft.MSI Environment    
   string Value 
"  
   string    , W i n 3 2 _ L a u n c h C o n d i t i o n ` R %  CIM_Check C      %     M V  f  } Win32_LaunchCondition Locale UUID {E79C9694-DB32-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  n Microsoft.MSI Condition    
   string    D W i n 3 2 _ O D B C D a t a S o u r c e S p e c i f i c a t i o n P C -  CIM_Check C #   +  1   Y   b  r    K Y W Win32_ODBCDataSourceSpecification Locale UUID {1F20B83E-DB33-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  z Microsoft.MSI DataSource    
   string Description @   
   string DriverDescription 
"  
 C   string Registration  &  
    string    < W i n 3 2 _ O D B C D r i v e r S p e c i f i c a t i o n 8 + -  CIM_Check C    '  -   U   ^  n   6 A Ws Win32_ODBCDriverSpecification Locale UUID {45BD8DD2-DB33-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  v Microsoft.MSI Description @   
   string Driver    
   string File 
"  
 .   string SetupFile  &  
 k   string    D W i n 3 2 _ O D B C T r a n s l a t o r S p e c i f i c a t i o n @ 3 -  CIM_Check C #   +  1   Y   b  r    = I W{ Win32_ODBCTranslatorSpecification Locale UUID {51E28842-DB33-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  z Microsoft.MSI Description @   
   string File 
"  
   string SetupFile  &  
 5   string Translator    
 s   string    4 W i n 3 2 _ P r o g I D S p e c i f i c a t i o n   )  CIM_Check C    #  )   Q   Z  j   4 Win32_ProgIDSpecification Locale UUID {8D500594-DB33-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  r Microsoft.MSI Description @   
   string Parent 
"  
   string ProgID    
 ,   string    $ W i n 3 2 _ R e s e r v e C o s t D 6 2  CIM_Check C      !     I R  b  y  8 G W y Win32_ReserveCost Locale UUID {C744CF5A-DB33-11d2-85FC-0000F8102E5F} MSIProv MappingStrings  j Microsoft.MSI ReserveFolder 
"  
   string ReserveKey    
   string ReserveLocal  &  
 0   uint32 ReserveSource  *  
 q   uint32    6 W i n 3 2 _ S e r v i c e S p e c i f i c a t i o n   G  CIM_Check C    $  *   R   [  k
  A E w   

MappingStrings  )
Win32Registry|Hardware\DeviceMap\SerialComm Parity   :  6

 



string MappingStrings 
Win32API|Communication Structures|DCB|Parity ValueMap     # ) None Odd Even Mark Space ParityCheckEnabled  >  )
 {    boolean MappingStrings  Win32API|Communication Structures|DCB|fParity RTSFlowControlType   @  )
      ( string ValueMap  < D M X Enable Disable Handshake Toggle Stopouu(u`txYÉ cY1 ;p,Ȱ t:K)B6_4 |_.Gm&j
-UZԤ )zFH!x_/_ؾ0o"E
Q4gGpǃP~ބFN)v{hN@pg
Z};
Pjڶ~`d]~= |]}h i//m 2(@1 cӜm x_uP~v3j.
BGq=jt|;ŅP铝G`yaݞjۇa,-\ p@M+^t.)r5_LM 9Bْ+@'NCetM 8^; =S<_ ډt0x*Olsc`0_, WݑɳF`ׅw7w8;&ĝF@ Te]ʝgRPˏ0.غ0P}IB| f;`s;PuJ}~@ ?hfX?A7ދ x4]#_!"<!C9)<{nB7㕜pW-_\`=>Y'ÃNׁ)!?80 W[_D?,;<:yh܂/lɹ?lNזTE9'){c|//^@OK=xOP_@zv_ oϑaUWD-H* ̇_T~_Zb̵W})GOoq@Kٴ A0:0x_xKϧDY|b|{Qȗ )U_Acҕ@Num!K{<џ 4BcCl '?A{kx'4/_m678&g;b-{$bB9!>埙?W{_gc_L9?^r68FTW Ja0!񍇼UaM|ǯP|rH$A aJ.&}Vx˃<ADu #}@2@⫁Ζ$ij%Z Q{+&@~eȂr & DS  ؟ _ JZ dUiH NAdXy,P* $@dXXb2jfiT($ѲsM n^
*ڴsA .ڲ Dz
;Giݹe<} [e`_DAN_a|:nY:Mu{oЦE!$L
ʴ*FFSM0_a JJ Tt:
09"Ê9z A:$P1@QB.9@(<}X
u `ciFׯNGOaÏ԰bnxG%$_ 5O`#R// LӳQǦ%!ÊݖT~BX gio~
That was from client #2 on the Tiger MP

zhotfire
24th May 2003, 03:15
This is from client #1 on the Tiger MP:
[04:17:04] Writing local files
[04:17:04] Completed 250000 out of 500000 steps (50)
[04:30:39] Writing local files
[04:30:39] Completed 255000 out of 500000 steps (51)
[04:44:12] Writing local files
[04:44:12] Completed 260000 out of 500000 steps (52)
[04:57:48] Writing local files
[04:57:48] Completed 265000 out of 500000 steps (53)
[05:11:23] Writing local files
[05:11:23] Completed 270000 out of 500000 steps (54)
?H'@$?ܾ?T.@~V@ĩ@ @o @}@!7@r`@W@@s%@I?"O@ @"#?d@C@?_@;@bg@@`@9@P@@@ a@{@Wm=@mw@V"S<n(D@@U> MJ@@CV
@m@@C@s@}7@B@mL@@b3@"?^@@?Xv@@Iu?N]@Dp@ah?+@?@`?͘@F@ga/?@h/@i,W@G @-@eK@GS@Y@n-@E@T??}@o?]G?@!??$?Y53@z|4?TO?n@z_?|2?N@uaW? X>K>@I@Lg#>;@@KuZ>a@@Q@h@?_@lv@?b7@@S??@QD +?s?@O'l??K@P
?ˮR?7G@$@N?(@I6@?S@ @4?_@n@??@h@b?
@@2)? @@R@w@"A@'@~@@@p@@gn@46@"~@e@9`@@g @5^@(@BG?X>õ@A`t?7>p@=? >Yj?+o?h@n*?"?1@tKA?Bv?R@n@hhW@Q@+@i@g@ O@hD@@fj?7??@^ь?U?(c@]A'?(?@4@_@?ʍ?b@?|?w@3?ȇ?  @[P?֎@_@?ƿ@@?tx@:@"RW?>II@ =?>
@"?>?yjC@Vz?)?2@?6?n@ۂ?u%@_%@f@N @a@:
2?O)e?l@4 K"ac V0
>pC6 ?Fo'iX?]G7@>>~=z>e?؇?,"5u˿BA<[>O~-z)7==[ܝ>_l~Z@C>t3׊)I< v:>ԑ=N?<$crĿ)U$?5ބk-:W>>S?ǹ?-)?!まmd=T>?=_*9?Ea>s>eb=<?I@9dQ">nb:~}<>!i&/>->R
o?*?I?=>@o3@# M<@4Z;?_D0aF@{y?o}i?pF?@ =o;m?lr‘?`?9;?Ö( ?.'>e:?e?_@A$?K?mxs+]@
_N>/z ko;Dk=?,h[=@#:=2}=K??ưk?[@_vq 707
1901SOL HW1 6174 3.101 5.013 0.757 -0.7665 -0.1051 -0.6985
1901SOL HW2 6175 3.085 5.146 0.686 1.6135 0.8383 0.4145
1902SOL OW 6176 0.802 5.746 0.171 0.4089 -0.3002 -0.0713
1902SOL HW1 6177 0.789 5.686 0.245 0.9246 -0.7084 -0.3032
1902SOL HW2 6178 0.822 5.689 0.097 0.3805 0.1327 -0.4178
1903SOL OW 6179 3.863 3.769 1.970 -0.2443 -0.2927 -0.9724
1903SOL HW1 6180 3.780 3.723 1.962 0.1837 -1.0800 -0.9597
1903SOL HW2 6181 3.891 3.783 1.880 -0.6731 0.5086 -0.9897
1904SOL OW 6182 3.551 4.918 4.659 -0.4732 0.4340 0.1638
1904SOL HW1 6183 3.457 4.933 4.672 -0.2305 1.0033 1.3190
1904SOL HW2 6184 3.594 4.965 4.730 0.6501 -0.5543 0.1824
1905SOL OW 6185 2.940 3.342 0.035 0.3317 1.0366 0.2225
1905SOL HW1 6186 3.028 3.360 0.068 0.9503 0.9772 -1.3194
1905SOL HW2 6187 2.930 3.400 -0.041 0.9418 -2.1636 -2.5792
1906SOL OW 6188 0.342 3.481 3.457 -0.1463 -0.2055 0.2107
1906SOL HW1 6189 0.403 3.471 3.530 -1.8631 1.1235 1.9378
1906SOL HW2 6190 0.312 3.392 3.439 0.1810 -0.5521 1.2498
1907SOL OW 6191 1.641 4.932 2.828 -0.7543 -0.0741 -0.2464
1907SOL HW1 6192 1.594 4.874 2.887 1.2401 -0.5271 0.9710
1907SOL HW2 6193 1.684 4.874 2.766 1.8591 -0.0851 1.4218
1908SOL OW 6194 3.141 4.736 3.403 -0.1325 -0.7856 0.7903
1908SOL HW1 6195 3.210 4.699 3.347 -0.7401 -0.1812 -0.4011
1908SOL HW2 6196 3.102 4.660 3.446 -0.0314 -1.3401 -0.0647
1909SOL OW 6197 4.085 4.472 4.792 0.5198 -0.4441 0.1386
1909SOL HW1 6198 4.135 4.423 4.858 0.4549 -0.7503 -0.0336
1909SOL HW2 6199 4.132 4.456 4.710 -0.7856 -2.1736 -0.3500
1910SOL OW 6200 4.331 5.155 3.113 0.3980 -0.2496 0.0320
1910SOL HW1 6201 4.274 5.231 3.109 -1.0252 -1.2974 -0.1514
1910SOL HW2 6202 4.418 5.189 3.092 -0.1123 1.4007 0.4640
1911SOL OW 6203 3.574 4.076 4.728 0.2964 -0.5319 -0.3044
1911SOL HW1 6204 3.488 4.119 4.732 0.2929 -0.5703 0.0506
1911SOL HW2 6205 3.559 3.997 4.677 0.1051 -0.5742 -0.1831
1912SOL OW 6206 5.666 1.185 2.251 0.1846 0.1914 -0.3986
1912SOL HW1 6207 5.708 1.166 2.167 -0.9071 0.3265 -0.9931
1912SOL HW2 6208 5.599 1.118 2.260 0.3920 0.0815 0.4605
1913SOL OW 6209 2.211 0.995 4.858 -0.2023 0.2638 -0.0050
1913SOL HW1 6210 2.253 0.984 4.773 -2.8820 -0.1540 -1.3756
1913SOL HW2 6211 2.216 1.089 4.876 0.0768 0.3968 -0.7698
1914SOL OW 6212 5.112 1.413 3.149 -0.2973 0.5152 -0.2569
1914SOL HW1 6213 5.191 1.441 3.103 -0.9746 1.3616 -0.9435
1914SOL HW2 6214 5.072 1.495 3.180 -0.9820 0.1104 -0.0488
1915SOL OW 6215 4.821 1.527 0.571 0.1290 -0.1370 -0.5104
1915SOL HW1 6216 4.892 1.464 0.583 1.7052 1.8447 1.6217
1915SOL HW2 6217 4.863 1.603 0.530 -1.2240 0.3056 -1.1227
1916SOL OW 6218 5.153 0.248 0.129 -0.8220 -0.3253 0.2206
1916SOL HW1 6219 5.218 0.306 0.168 -2.1078 -0.0800 2.1386
1916SOL HW2 6220 5.188 0.160 0.143 -0.6745 -0.1737 0.8359
1917SOL OW 6221 1.058 2.661 1.817 0.1384 0.0563 -0.6598
1917SOL HW1 6222 1.042 2.605 1.892 0.5240 1.4192 0.4829
1917SOL HW2 6223 0.973 2.671 1.775 -0.0642 -0.7605 -0.4543
1918SOL OW 6224 1.341 1.446 2.049 0.1465 -0.0949 0.4436
1918SOL HW1 6225 1.357 1.368 1.996 1.1941 0.0242 0.5582
1918SOL HW2 6226 1.246 1.456 2.049 0.0235 -1.1361 0.0952
1919SOL OW 6227 0.917 2.136 1.924 -0.5904 -0.0188 0.1669
1919SOL HW1 6228 0.851 2.140 1.855 0.1216 -1.8428 -0.7002
1919SOL HW2 6229 0.999 2.120 1.877 0.4795 1.9815 1.2214
Whatcha think Bruce? This locked the machine solid. The client could not resume the woonit, but had to start over.

zhotfire
25th May 2003, 01:34
Another day, another crash. The cpus are TBred "A"s Don't know if it's a coincidence, but the crashes started just after the gromacs core became official. Seems weird thou.... i mean, why would it run the beta perfectly fine, then crash & burn with the official release? *sigh* Running the clients without the "-forceasm" flag.... lets see if that points to the true problem.

Bruce
27th May 2003, 17:54
You're probably running FAT file system, not NTFS (or another JFS).

After a crash, it is relatively common for FAHlog to contain garbage data from some other application (or even from one of the other FAH files.) I don't know what programming tricks they use to manage FAHlog, but the file is not as robust as it should be. On my own machines, I've seen occasional errors like that - - and often the client was still running correctly. I restarted the client, and it didn't start from the last reported frame, but from the one that it should have been on by that time.

Stanford has never duplicated the problem, so I don't anticipate a fix, particularly since it (apparently) doesn't cut into the science. The occasional corruption errors reported in other files are significantly less frequent, so I doubt they really are the same problem - - although they do cause the restart of a WU, and therefore are a more serious issue.

zhotfire
28th May 2003, 01:16
Running fine without the -forceasm flag... no more lockups. I miss my SSE boost... :( :sad: