Opened at 2010-11-05T15:33:55Z
Last modified at 2010-11-16T01:27:03Z
#1250 closed defect
trac timeline doesn't like davidsarah (maybe all admin accounts?) — at Version 7
Reported by: | davidsarah | Owned by: | somebody |
---|---|---|---|
Priority: | minor | Milestone: | soon (release n/a) |
Component: | dev-infrastructure | Version: | n/a |
Keywords: | trac | Cc: | |
Launchpad Bug: |
Description (last modified by davidsarah_test)
http://tahoe-lafs.org/trac/tahoe-lafs/timeline gives a 500 Internal Server Error. This has been broken for a while (a few months?), IIRC. The message is completely generic, but says that "More information about this error may be available in the server error log."
Workarounds:
- use http://tahoe-lafs.org/trac/tahoe-lafs/log/trunk/ to see trunk commits, and the tahoe-dev list to follow comments on the trac.
- log out, or log in as a non-admin account.
Change History (7)
comment:1 Changed at 2010-11-05T16:57:34Z by zooko
- Summary changed from trac timeline is broken to trac timeline occasionally gives errors
comment:2 Changed at 2010-11-12T13:42:48Z by zooko
Perhaps the underlying hard drives are dying:
root@tahoe-lafs:~# smartctl --all /dev/sda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.9 family Device Model: ST3400633AS Serial Number: 3PM0AE7Y Firmware Version: 3.AAD User Capacity: 400,088,457,216 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Nov 12 13:41:32 2010 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 179) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 095 079 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 086 060 030 Pre-fail Always - 415054934 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10759 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 27 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 081 059 045 Old_age Always - 19 (Lifetime Min/Max 18/26) 194 Temperature_Celsius 0x0022 019 041 000 Old_age Always - 19 (0 18 0 0) 195 Hardware_ECC_Recovered 0x001a 060 052 000 Old_age Always - 81757662 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 7243 - # 2 Short captive Interrupted (host reset) 70% 7243 - # 3 Extended captive Interrupted (host reset) 30% 7237 - # 4 Short offline Completed without error 00% 7237 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@tahoe-lafs:~# smartctl --all /dev/sdb smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.9 family Device Model: ST3400633AS Serial Number: 3PM09JNE Firmware Version: 3.AAD User Capacity: 400,088,457,216 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Fri Nov 12 13:41:35 2010 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 179) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 073 055 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 090 090 036 Pre-fail Always - 413 7 Seek_Error_Rate 0x000f 089 060 030 Pre-fail Always - 830320960 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 33400 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 53445 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 083 048 045 Old_age Always - 17 (64 84 24 16) 194 Temperature_Celsius 0x0022 017 052 000 Old_age Always - 17 (0 14 0 0) 195 Hardware_ECC_Recovered 0x001a 049 046 000 Old_age Always - 2845069 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
I don't know how to interpret these numbers. Does this mean those drives are failing?
comment:3 Changed at 2010-11-13T06:45:47Z by zooko
Later, after much normal activity plus some runs of hdparm: (note that you have to scroll down for /dev/sdb results)
root@tahoe-lafs:~# smartctl --all /dev/sda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.9 family Device Model: ST3400633AS Serial Number: 3PM0AE7Y Firmware Version: 3.AAD User Capacity: 400,088,457,216 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Nov 13 06:44:00 2010 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 179) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 095 079 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 26 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 086 060 030 Pre-fail Always - 415146516 9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 10775 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 27 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 079 059 045 Old_age Always - 21 (Lifetime Min/Max 18/26) 194 Temperature_Celsius 0x0022 021 041 000 Old_age Always - 21 (0 18 0 0) 195 Hardware_ECC_Recovered 0x001a 061 052 000 Old_age Always - 99357267 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 7243 - # 2 Short captive Interrupted (host reset) 70% 7243 - # 3 Extended captive Interrupted (host reset) 30% 7237 - # 4 Short offline Completed without error 00% 7237 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@tahoe-lafs:~# smartctl --all /dev/sdb smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.9 family Device Model: ST3400633AS Serial Number: 3PM09JNE Firmware Version: 3.AAD User Capacity: 400,088,457,216 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Sat Nov 13 06:44:17 2010 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 430) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 179) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 073 055 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0003 098 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 32 5 Reallocated_Sector_Ct 0x0033 090 090 036 Pre-fail Always - 413 7 Seek_Error_Rate 0x000f 089 060 030 Pre-fail Always - 830463240 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 33417 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 53445 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 081 048 045 Old_age Always - 19 (64 84 24 16) 194 Temperature_Celsius 0x0022 019 052 000 Old_age Always - 19 (0 14 0 0) 195 Hardware_ECC_Recovered 0x001a 060 046 000 Old_age Always - 58836757 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 33400 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
comment:4 Changed at 2010-11-13T07:35:43Z by adi
Replying to zooko:
I don't know how to interpret these numbers. Does this mean those drives are failing?
Possibly. The following are concerning:
Reallocated_Sector_Ct 413 Seek_Error_Rate 830320960 Reported_Uncorrect 53445 Hardware_ECC_Recovered 2845069
Especially, the Reallocated_Sector_Ct is high (and almost always is a raw counter value, so 413 is likely the actual value).
On IRC zooko also mentioned that the drive is showing low throughput (both from hdparm and while rsyncing data), giving further evidence it's not entirely happy.
comment:5 follow-up: ↓ 6 Changed at 2010-11-14T19:32:17Z by davidsarah
I'm confused. For me the timeline always gives a 500 error; at least, I've never seen it succeed within the past few months.
comment:6 in reply to: ↑ 5 ; follow-up: ↓ 7 Changed at 2010-11-14T20:04:24Z by zooko
Replying to davidsarah:
I'm confused. For me the timeline always gives a 500 error; at least, I've never seen it succeed within the past few months.
Oh, wow! That's interesting. So, two nights ago I discovered, with adi's help, that /dev/sdb was dying and I moved all of the trac state over to /dev/sda. This should make trac go faster as well as emit IO errors less.
Now, I wonder if there is something particular to your user account in trac that makes this always happen for you. Have you tried doing it when logged out of trac?
comment:7 in reply to: ↑ 6 Changed at 2010-11-15T02:21:15Z by davidsarah_test
- Description modified (diff)
- Summary changed from trac timeline occasionally gives errors to trac timeline doesn't like davidsarah (maybe all admin accounts?)
- Version changed from 1.8.0 to n/a
Replying to zooko:
Now, I wonder if there is something particular to your user account in trac that makes this always happen for you. Have you tried doing it when logged out of trac?
That appears to be it. When logged out or when logged in as davidsarah_test (which does not have any Admin privileges), I can always see the timeline; when logged in as davidsarah, I always get a 500 error.
Not true! I use the Timeline many times every day. It usually works. Occasionally it gives errors. People (Twisted devs) tell me that this is standard for Trac and that there is nothing to be done about it, but I hope this isn't true. In any case, the workaround is retry right away, or if that doesn't work, wait a few seconds and retry.