Discussion:
Kernel stability on baytrail machines
(too old to reply)
Michal Feix
2016-03-03 16:40:47 UTC
Permalink
Hello everyone,

aprox. 6 months ago I started facing random freezes on my baytrail based
computers I manage. It took me a while before I found a bug report in
freedesktop bugzilla named "complete freeze after: drm/i915/vlv: WA for
Turbo and RC6 to work together" -
https://bugs.freedesktop.org/show_bug.cgi?id=88012. It took a few more
months for this bug to escalate into MAJOR importance and was later
moved into kernel bugzilla as "intel_idle.max_cstate=1 required on
baytrail to prevent crashes" -
https://bugzilla.kernel.org/show_bug.cgi?id=109051.

Based on the ammount of comments in both bugtickets and probably
connected observations on different linux distros forums, this seems to
be a showstopper on mainstream Baytrail based machines for many users.
I'm trying to understand, how visible (and thus important) is this
instability across baytrail machines on linux kernel across population.

I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?

Cheers,
--
Michael
Pavel Machek
2016-04-09 16:01:34 UTC
Permalink
Hi!
Post by Michal Feix
aprox. 6 months ago I started facing random freezes on my baytrail
based computers I manage. It took me a while before I found a bug
drm/i915/vlv: WA for Turbo and RC6 to work together" -
https://bugs.freedesktop.org/show_bug.cgi?id=88012. It took a few
more months for this bug to escalate into MAJOR importance and was
later moved into kernel bugzilla as "intel_idle.max_cstate=1
required on baytrail to prevent crashes" -
https://bugzilla.kernel.org/show_bug.cgi?id=109051.
Based on the ammount of comments in both bugtickets and probably
connected observations on different linux distros forums, this seems
to be a showstopper on mainstream Baytrail based machines for many
users. I'm trying to understand, how visible (and thus important) is
this instability across baytrail machines on linux kernel across
population.
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.

I'm sure someone cares :-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
One Thousand Gnomes
2016-04-09 19:14:35 UTC
Permalink
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.

Alan
Ezequiel Garcia
2016-07-12 19:49:09 UTC
Permalink
Hi Alan,

(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?

The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.

Also, do we know which CPUs are affect by this issue?
and which are NOT affected :) - would be quite relevant
in picking a CPU for a product.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=109051
--
Ezequiel Garcia, VanguardiaSur
www.vanguardiasur.com.ar
Pavel Machek
2016-07-12 21:25:06 UTC
Permalink
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
Does

"intel_idle.max_cstate=1"

fix it for you?

If you feel it is X-only problem, you may want to provide details
about your graphics subsystem (DRM enabled? framebuffer only?) and
probably cc.

..actually... you may want to verify if it happens in unaccelerated X.

INTEL DRM DRIVERS (excluding Poulsbo, Moorestown and derivative
chipsets)
M: Daniel Vetter <***@intel.com>
M: Jani Nikula <***@linux.intel.com>
L: intel-***@lists.freedesktop.org
L: dri-***@lists.freedesktop.org
W: https://01.org/linuxgraphics/
Q: http://patchwork.freedesktop.org/project/intel-gfx/
T: git git://anongit.freedesktop.org/drm-intel
S: Supported
F: drivers/gpu/drm/i915/
F: include/drm/i915*
F: include/uapi/drm/i915_drm.h

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Michal Feix
2016-07-13 10:11:36 UTC
Permalink
Post by Pavel Machek
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
Does
"intel_idle.max_cstate=1"
fix it for you?
Yes, it does.
Post by Pavel Machek
If you feel it is X-only problem, you may want to provide details
about your graphics subsystem (DRM enabled? framebuffer only?) and
probably cc.
It's not X-only problem. Happens even in console mode, which is KMS
switched during boot though.
Post by Pavel Machek
...actually... you may want to verify if it happens in unaccelerated X.
As it happens even in console mode, is this relevant test?

Michal
Pavel Machek
2016-07-13 10:48:32 UTC
Permalink
Post by Michal Feix
Post by Pavel Machek
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
Does
"intel_idle.max_cstate=1"
fix it for you?
Yes, it does.
Post by Pavel Machek
If you feel it is X-only problem, you may want to provide details
about your graphics subsystem (DRM enabled? framebuffer only?) and
probably cc.
It's not X-only problem. Happens even in console mode, which is KMS
switched during boot though.
Post by Pavel Machek
...actually... you may want to verify if it happens in unaccelerated X.
As it happens even in console mode, is this relevant test?
No, no need to test with X.

Would it be possible to test in good old VGA mode?

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Michal Feix
2016-07-19 04:51:34 UTC
Permalink
Dne 13.7.2016 v 12:48 Pavel Machek napsal(a)
Post by Pavel Machek
Post by Michal Feix
Post by Pavel Machek
Post by Ezequiel Garcia
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
Does
"intel_idle.max_cstate=1"
fix it for you?
Yes, it does.
Post by Pavel Machek
If you feel it is X-only problem, you may want to provide details
about your graphics subsystem (DRM enabled? framebuffer only?) and
probably cc.
It's not X-only problem. Happens even in console mode, which is KMS
switched during boot though.
Post by Pavel Machek
...actually... you may want to verify if it happens in unaccelerated X.
As it happens even in console mode, is this relevant test?
No, no need to test with X.
Would it be possible to test in good old VGA mode?
For past few days I updated to 4.6.3 kernel and tested with X and
without X, in console mode and no KMS. My machine is way more stable
than with previous 4.5.* and 4.4.* kernels I tried.

With 4.6.3 kernel and X running, I had only one freeze during Firefox
session with video playback. For the rest of two days of testing, no hang.

I was not able to hang the machine during another 2 days of testing with
4.6.3 kernel and console mode with KMS disabled. It's fair to say, that
stress testing of GFX is quite limited when running in console mode :-).
I tried hard with mplayer with libcaca and with repeated kernel
compilation task. No hang occured.

My conclussion is that 4.6.3 is surelly a huge improvement, compared to
4.5 and 4.4. kernels regarding Bay trail stability issue.

Michal Feix
One Thousand Gnomes
2016-07-18 13:30:25 UTC
Permalink
On Tue, 12 Jul 2016 16:41:58 -0300
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
There are two things we are currently tracking. One of them was merged
which seems to have made my machine stable at least and fixes a problem
related to the MMC. The second one we may need is a power related changed
to SPI to hold the CPU in C0/C1 whenever the ACPI _SEM is held.

Graphics shows these problems up because of the way the GPU causes power
state changes.

Alan
Michal Feix
2016-09-20 13:36:42 UTC
Permalink
Hi, I think there might be another clue on this one.

One of the comments is also mentioning an unfixed erratum of certain
Baytrail processors, named as "EOI Transaction May Not be Sent if
Software Enters Core C6 During an Interrupt Service Routine". This
erratum can be found on several different processors, even on several
non-baytrails, like Inte Xeon 3400 and similar.

I also came across a patch that was created for SUSE and that seems to
be adressing this issue in pre 4.X kernels:

https://build.opensuse.org/package/view_file?file=22160-Intel-C6-EOI.patch&package=xen&project=home%3Acharlesa%3AopenSUSE11.3&rev=7

---
Michal
Post by One Thousand Gnomes
On Tue, 12 Jul 2016 16:41:58 -0300
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
There are two things we are currently tracking. One of them was merged
which seems to have made my machine stable at least and fixes a problem
related to the MMC. The second one we may need is a power related changed
to SPI to hold the CPU in C0/C1 whenever the ACPI _SEM is held.
Graphics shows these problems up because of the way the GPU causes power
state changes.
Alan
l***@gmail.com
2016-10-19 09:40:55 UTC
Permalink
Hi Michal,

see related subject here:

https://bugzilla.kernel.org/show_bug.cgi?id=109051#c539

Regards,

Wolfgang
Post by Michal Feix
Hi, I think there might be another clue on this one.
One of the comments is also mentioning an unfixed erratum of certain
Baytrail processors, named as "EOI Transaction May Not be Sent if
Software Enters Core C6 During an Interrupt Service Routine". This
erratum can be found on several different processors, even on several
non-baytrails, like Inte Xeon 3400 and similar.
I also came across a patch that was created for SUSE and that seems to
https://build.opensuse.org/package/view_file?file=22160-Intel-C6-EOI.patch&package=xen&project=home%3Acharlesa%3AopenSUSE11.3&rev=7
---
Michal
Post by One Thousand Gnomes
On Tue, 12 Jul 2016 16:41:58 -0300
Post by Ezequiel Garcia
Hi Alan,
(Adding interested people to this thread)
Post by One Thousand Gnomes
Post by Pavel Machek
Post by Michal Feix
I do feel that the importance of the mentioned bug is currently
underestimated. Can anyone here give a note, how much current linux
kernel is supposed to be stable on general baytrail machines?
If you did not get any replies... you might want to check MAINTAINERS file, and
put Intel x86 maintainers on Cc list.
I'm sure someone cares :-).
Yes we care, and there are people looking at the various reports.
Are there any updates on the status of this issue?
The current bugzilla report [1] marks this as a power management
issue. However, many reports indicate that it would only freeze
when running X, so it's not completely clear if it's related to
the gfx driver too.
There are two things we are currently tracking. One of them was merged
which seems to have made my machine stable at least and fixes a problem
related to the MMC. The second one we may need is a power related changed
to SPI to hold the CPU in C0/C1 whenever the ACPI _SEM is held.
Graphics shows these problems up because of the way the GPU causes power
state changes.
Alan
Loading...