Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#46147 closed defect (fixed)

smartmontools@6.20 - smartd automated scanning failing

Reported by: thatrat@… Owned by: pixilla (Bradley Giesbrecht)
Priority: Normal Milestone:
Component: ports Version: 2.3.3
Keywords: Cc:
Port: smartmontools

Description

Smartmontools/smartd is no longer scanning drives when I have the appropriate configuration file at /opt/local/etc/smartd.conf configured in Yosemite.

Here's the relevant line in /opt/local/etc/smartd.conf to run a short scan at 5AM every morning and then a long scan at 6AM:

/dev/disk0 -H -l error -l selftest -f -s (S/../.././05|L/../.././06) -m xxxxxxx@xxxxx.xxx

I took a look at the console, and saw the following lines when the scanning was supposed to occur:

Nov 25 05:22:24 Ubences-Intel-iMac.local smartd[152]: Authorization, server not available
Nov 25 05:22:24 Ubences-Intel-iMac.local smartd[152]: Device: /dev/disk0, execute Short Self-Test failed.
Nov 25 06:22:25 Ubences-Intel-iMac.local smartd[152]: Device: /dev/disk0, execute Long Self-Test failed.

What stands out to me is the first line "Authorization, server not available".

I've tried looking for info on this particular output, but I can't find anything to trace back to how this is occurring?

I think this might not have been working since Mavericks. It used to work perfectly a few Mac OS X releases ago.

I'm running the latest version of smartmontools that MacPorts has available [6.2] on Yosemite.

Below is output from running smartctl -a /dev/disk0. The test log shows when the automated scanning used to work properly.

Ubences-Intel-iMac:~ uquevedo$ smartctl -a /dev/disk0
smartctl 6.2 2013-07-26 r3841 [x86_64-apple-darwin14.0.0] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Black
Device Model:     WDC WD1002FAEX-00Y9A0
Serial Number:    [Hiding this on purpose]
LU WWN Device Id: 5 0014ee 2064339e8
Firmware Version: 05.01D05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Dec  4 17:03:58 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(16860) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 174) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       13
  3 Spin_Up_Time            0x0027   171   170   021    Pre-fail  Always       -       4433
  4 Start_Stop_Count        0x0032   076   076   000    Old_age   Always       -       24328
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5066
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   079   079   000    Old_age   Always       -       21823
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       24306
194 Temperature_Celsius     0x0022   107   081   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      5066         -
# 2  Extended offline    Completed without error       00%      4673         -
# 3  Extended offline    Completed without error       00%      4080         -
# 4  Short offline       Completed without error       00%      2987         -
# 5  Extended offline    Aborted by host               80%      2985         -
# 6  Short offline       Completed without error       00%      2982         -
# 7  Short offline       Completed without error       00%      2975         -
# 8  Short offline       Aborted by host               90%      2962         -
# 9  Short offline       Completed without error       00%      2960         -
#10  Short offline       Completed without error       00%      2957         -
#11  Short offline       Completed without error       00%      2956         -
#12  Short offline       Completed without error       00%      2950         -
#13  Short offline       Completed without error       00%      2948         -
#14  Short offline       Completed without error       00%      2946         -
#15  Short offline       Completed without error       00%      2944         -
#16  Short offline       Completed without error       00%      2941         -
#17  Short offline       Completed without error       00%      2939         -
#18  Short offline       Completed without error       00%      2934         -
#19  Short offline       Completed without error       00%      2933         -
#20  Short offline       Completed without error       00%      2929         -
#21  Short offline       Completed without error       00%      2927         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Attachments (1)

patch-sysutils-smartmontools-no-fork.diff (668 bytes) - added by pixilla (Bradley Giesbrecht) 9 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 9 years ago by pixilla (Bradley Giesbrecht)

Does the recent update to version 6.3 solve this issue? See r129182.

comment:2 Changed 9 years ago by thatrat@…

Unfortunately, no:

12/8/14 7:33:23.362 PM smartd[2669]: Authorization, server not available
12/8/14 7:33:23.362 PM smartd[2669]: Device: /dev/disk0, execute Long Self-Test failed.
12/8/14 8:03:23.519 PM smartd[2669]: Device: /dev/disk0, execute Short Self-Test failed.

However, running the daemon manually does work:

Ubences-Intel-iMac:~ uquevedo$ sudo /opt/local/sbin/smartd -n -d -c /opt/local/etc/smartd.conf
smartd 6.3 2014-07-26 r3976 [x86_64-apple-darwin14.0.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

Opened configuration file /opt/local/etc/smartd.conf
Configuration file /opt/local/etc/smartd.conf parsed.
Device: /dev/disk0, opened
Device: /dev/disk0, WDC WD1002FAEX-00Y9A0, S/N:WD-WCAW32908017, WWN:5-0014ee-2064339e8, FW:05.01D05, 1.00 TB
Device: /dev/disk0, found in smartd database: Western Digital Black
Device: /dev/disk0, is SMART capable. Adding to "monitor" list.
Device: /dev/disk0, state read from /opt/local/var/lib/smartmontools/smartd.WDC_WD1002FAEX_00Y9A0-WD_WCAW32908017.ata.state
Monitoring 1 ATA and 0 SCSI devices
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, state written to /opt/local/var/lib/smartmontools/smartd.WDC_WD1002FAEX_00Y9A0-WD_WCAW32908017.ata.state
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, starting scheduled Short Self-Test.
Device: /dev/disk0, state written to /opt/local/var/lib/smartmontools/smartd.WDC_WD1002FAEX_00Y9A0-WD_WCAW32908017.ata.state
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, starting scheduled Long Self-Test.
Device: /dev/disk0, state written to /opt/local/var/lib/smartmontools/smartd.WDC_WD1002FAEX_00Y9A0-WD_WCAW32908017.ata.state
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
Device: /dev/disk0, opened ATA device
^\smartd received signal 3: Quit: 3
Device: /dev/disk0, state written to /opt/local/var/lib/smartmontools/smartd.WDC_WD1002FAEX_00Y9A0-WD_WCAW32908017.ata.state
smartd is exiting (exit status 0)
Ubences-Intel-iMac:~ uquevedo$ smartctl -a /dev/disk0
smartctl 6.3 2014-07-26 r3976 [x86_64-apple-darwin14.0.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black
Device Model:     WDC WD1002FAEX-00Y9A0
Serial Number:    WD-WCAW32908017
LU WWN Device Id: 5 0014ee 2064339e8
Firmware Version: 05.01D05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Dec  9 05:47:27 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(16860) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 174) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       13
  3 Spin_Up_Time            0x0027   172   170   021    Pre-fail  Always       -       4375
  4 Start_Stop_Count        0x0032   076   076   000    Old_age   Always       -       24368
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       5124
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   079   079   000    Old_age   Always       -       21861
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       24346
194 Temperature_Celsius     0x0022   101   081   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5124         -
# 2  Short offline       Completed without error       00%      5120         -
# 3  Short offline       Completed without error       00%      5093         -
# 4  Extended offline    Completed without error       00%      5082         -
# 5  Short offline       Completed without error       00%      5069         -
# 6  Short offline       Completed without error       00%      5066         -
# 7  Extended offline    Completed without error       00%      4673         -
# 8  Extended offline    Completed without error       00%      4080         -
# 9  Short offline       Completed without error       00%      2987         -
#10  Extended offline    Aborted by host               80%      2985         -
#11  Short offline       Completed without error       00%      2982         -
#12  Short offline       Completed without error       00%      2975         -
#13  Short offline       Aborted by host               90%      2962         -
#14  Short offline       Completed without error       00%      2960         -
#15  Short offline       Completed without error       00%      2957         -
#16  Short offline       Completed without error       00%      2956         -
#17  Short offline       Completed without error       00%      2950         -
#18  Short offline       Completed without error       00%      2948         -
#19  Short offline       Completed without error       00%      2946         -
#20  Short offline       Completed without error       00%      2944         -
#21  Short offline       Completed without error       00%      2941         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

comment:3 Changed 9 years ago by pixilla (Bradley Giesbrecht)

Keywords: smartmontools smartd smartctl removed
Owner: changed from macports-tickets@… to takanori@…

Changed 9 years ago by pixilla (Bradley Giesbrecht)

comment:4 Changed 9 years ago by pixilla (Bradley Giesbrecht)

thatrat: does the attached no-fork patch solve the issue for you? With this patch, smartmontools has been running for 24+ hours here.

comment:5 Changed 9 years ago by thatrat@…

It looks like the patch worked.

From the below, the increments in testing are recent relative to the power on hours value:

Ubences-Intel-iMac:smartmontools uquevedo$ smartctl -a /dev/disk0
smartctl 6.3 2014-07-26 r3976 [x86_64-apple-darwin14.0.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black
Device Model:     WDC WD1002FAEX-00Y9A0
Serial Number:    [hiding this]
LU WWN Device Id: 5 0014ee 2064339e8
Firmware Version: 05.01D05
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Dec 11 12:32:48 2014 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
					was aborted by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(16860) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 174) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       13
  3 Spin_Up_Time            0x0027   174   170   021    Pre-fail  Always       -       4300
  4 Start_Stop_Count        0x0032   076   076   000    Old_age   Always       -       24413
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       5149
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   079   079   000    Old_age   Always       -       21882
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21
193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       24391
194 Temperature_Celsius     0x0022   102   081   000    Old_age   Always       -       45
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5146         -
# 2  Extended offline    Completed without error       00%      5143         -
# 3  Short offline       Completed without error       00%      5140         -
# 4  Extended offline    Completed without error       00%      5135         -
# 5  Extended offline    Completed without error       00%      5124         -
# 6  Short offline       Completed without error       00%      5120         -
# 7  Short offline       Completed without error       00%      5093         -
# 8  Extended offline    Completed without error       00%      5082         -
# 9  Short offline       Completed without error       00%      5069         -
#10  Short offline       Completed without error       00%      5066         -
#11  Extended offline    Completed without error       00%      4673         -
#12  Extended offline    Completed without error       00%      4080         -
#13  Short offline       Completed without error       00%      2987         -
#14  Extended offline    Aborted by host               80%      2985         -
#15  Short offline       Completed without error       00%      2982         -
#16  Short offline       Completed without error       00%      2975         -
#17  Short offline       Aborted by host               90%      2962         -
#18  Short offline       Completed without error       00%      2960         -
#19  Short offline       Completed without error       00%      2957         -
#20  Short offline       Completed without error       00%      2956         -
#21  Short offline       Completed without error       00%      2950         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Can this patch be rolled into the main release so I can update some other systems to test?

comment:6 Changed 9 years ago by pixilla (Bradley Giesbrecht)

Owner: changed from takanori@… to pixilla@…
Status: newassigned

comment:7 Changed 9 years ago by pixilla (Bradley Giesbrecht)

Resolution: fixed
Status: assignedclosed

See r129382

comment:8 in reply to:  5 Changed 9 years ago by pixilla (Bradley Giesbrecht)

Replying to thatrat@…:

Can this patch be rolled into the main release so I can update some other systems to test?

Done.

Note: See TracTickets for help on using tickets.