#1250 closed defect (fixed)

trac timeline doesn't like davidsarah

Reported by: davidsarah Owned by: somebody
Priority: minor Milestone: soon (release n/a)
Component: dev-infrastructure Version: n/a
Keywords: trac Cc:
Launchpad Bug:

Description (last modified by davidsarah_test)

http://tahoe-lafs.org/trac/tahoe-lafs/timeline gives a 500 Internal Server Error. This has been broken for a while (a few months?), IIRC. The message is completely generic, but says that "More information about this error may be available in the server error log."

Workarounds:

Change History (8)

comment:1 Changed at 2010-11-05T16:57:34Z by zooko

  • Summary changed from trac timeline is broken to trac timeline occasionally gives errors

Not true! I use the Timeline many times every day. It usually works. Occasionally it gives errors. People (Twisted devs) tell me that this is standard for Trac and that there is nothing to be done about it, but I hope this isn't true. In any case, the workaround is retry right away, or if that doesn't work, wait a few seconds and retry.

comment:2 Changed at 2010-11-12T13:42:48Z by zooko

Perhaps the underlying hard drives are dying:

root@tahoe-lafs:~# smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.9 family
Device Model:     ST3400633AS
Serial Number:    3PM0AE7Y
Firmware Version: 3.AAD
User Capacity:    400,088,457,216 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Nov 12 13:41:32 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 179) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   095   079   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   030    Pre-fail  Always       -       415054934
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10759
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   081   059   045    Old_age   Always       -       19 (Lifetime Min/Max 18/26)
194 Temperature_Celsius     0x0022   019   041   000    Old_age   Always       -       19 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   060   052   000    Old_age   Always       -       81757662
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged   

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7243         -
# 2  Short captive       Interrupted (host reset)      70%      7243         -
# 3  Extended captive    Interrupted (host reset)      30%      7237         -
# 4  Short offline       Completed without error       00%      7237         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@tahoe-lafs:~# smartctl --all /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.9 family
Device Model:     ST3400633AS
Serial Number:    3PM09JNE
Firmware Version: 3.AAD
User Capacity:    400,088,457,216 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri Nov 12 13:41:35 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 179) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   055   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   090   090   036    Pre-fail  Always       -       413
  7 Seek_Error_Rate         0x000f   089   060   030    Pre-fail  Always       -       830320960
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       33400
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       53445
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   083   048   045    Old_age   Always       -       17 (64 84 24 16)
194 Temperature_Celsius     0x0022   017   052   000    Old_age   Always       -       17 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   049   046   000    Old_age   Always       -       2845069
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged   

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I don't know how to interpret these numbers. Does this mean those drives are failing?

comment:3 Changed at 2010-11-13T06:45:47Z by zooko

Later, after much normal activity plus some runs of hdparm: (note that you have to scroll down for /dev/sdb results)

root@tahoe-lafs:~# smartctl --all /dev/sda
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.9 family
Device Model:     ST3400633AS
Serial Number:    3PM0AE7Y
Firmware Version: 3.AAD
User Capacity:    400,088,457,216 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Nov 13 06:44:00 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 179) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   095   079   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   030    Pre-fail  Always       -       415146516
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10775
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   079   059   045    Old_age   Always       -       21 (Lifetime Min/Max 18/26)
194 Temperature_Celsius     0x0022   021   041   000    Old_age   Always       -       21 (0 18 0 0)
195 Hardware_ECC_Recovered  0x001a   061   052   000    Old_age   Always       -       99357267
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged   

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7243         -
# 2  Short captive       Interrupted (host reset)      70%      7243         -
# 3  Extended captive    Interrupted (host reset)      30%      7237         -
# 4  Short offline       Completed without error       00%      7237         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@tahoe-lafs:~# smartctl --all /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.9 family
Device Model:     ST3400633AS
Serial Number:    3PM09JNE
Firmware Version: 3.AAD
User Capacity:    400,088,457,216 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sat Nov 13 06:44:17 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 179) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   073   055   006    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   090   090   036    Pre-fail  Always       -       413
  7 Seek_Error_Rate         0x000f   089   060   030    Pre-fail  Always       -       830463240
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       33417
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       53445
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   081   048   045    Old_age   Always       -       19 (64 84 24 16)
194 Temperature_Celsius     0x0022   019   052   000    Old_age   Always       -       19 (0 14 0 0)
195 Hardware_ECC_Recovered  0x001a   060   046   000    Old_age   Always       -       58836757
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged   

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     33400         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

comment:4 Changed at 2010-11-13T07:35:43Z by adi

Replying to zooko:

I don't know how to interpret these numbers. Does this mean those drives are failing?

Possibly. The following are concerning:

Reallocated_Sector_Ct       413   
Seek_Error_Rate             830320960
Reported_Uncorrect          53445 
Hardware_ECC_Recovered      2845069

Especially, the Reallocated_Sector_Ct is high (and almost always is a raw counter value, so 413 is likely the actual value).

On IRC zooko also mentioned that the drive is showing low throughput (both from hdparm and while rsyncing data), giving further evidence it's not entirely happy.

comment:5 follow-up: Changed at 2010-11-14T19:32:17Z by davidsarah

I'm confused. For me the timeline always gives a 500 error; at least, I've never seen it succeed within the past few months.

comment:6 in reply to: ↑ 5 ; follow-up: Changed at 2010-11-14T20:04:24Z by zooko

Replying to davidsarah:

I'm confused. For me the timeline always gives a 500 error; at least, I've never seen it succeed within the past few months.

Oh, wow! That's interesting. So, two nights ago I discovered, with adi's help, that /dev/sdb was dying and I moved all of the trac state over to /dev/sda. This should make trac go faster as well as emit IO errors less.

Now, I wonder if there is something particular to your user account in trac that makes this always happen for you. Have you tried doing it when logged out of trac?

comment:7 in reply to: ↑ 6 Changed at 2010-11-15T02:21:15Z by davidsarah_test

  • Description modified (diff)
  • Summary changed from trac timeline occasionally gives errors to trac timeline doesn't like davidsarah (maybe all admin accounts?)
  • Version changed from 1.8.0 to n/a

Replying to zooko:

Now, I wonder if there is something particular to your user account in trac that makes this always happen for you. Have you tried doing it when logged out of trac?

That appears to be it. When logged out or when logged in as davidsarah_test (which does not have any Admin privileges), I can always see the timeline; when logged in as davidsarah, I always get a 500 error.

comment:8 Changed at 2010-11-16T01:27:03Z by davidsarah

  • Resolution set to fixed
  • Status changed from new to closed
  • Summary changed from trac timeline doesn't like davidsarah (maybe all admin accounts?) to trac timeline doesn't like davidsarah

This was due to a high "days back" setting (512 days) for the timeline view; see http://trac.edgewall.org/ticket/9829 for details.

Note: See TracTickets for help on using tickets.