Touchpad, Interrupted
For two years I've been driving myself crazy trying to figure out the source of a driver problem on OpenBSD: interrupts never arrived for certain touchpad devices. While debugging an unrelated issue over the weekend, I finally solved it.
It's been a long journey and it's a technical tale, but here it is.
Table of Contents
Windows Precision
In 2015, I purchased a Samsung ATIV Book 9 laptop. Its touchpad was different than most other laptops used with OpenBSD previously, and would be a model for most touchpads to come after it: a Windows Precision Touchpad connected over I2C.
Most other laptops had a touchpad connected through the 8042 (PS/2) controller along with the keyboard, emulating the historical design of PCs having two PS/2 ports for an external mouse and keyboard. These touchpads from Synaptics, Elan, and ALPS each spoke a proprietary protocol and were rather bandwidth-constrained in terms of how much finger data they could communicate back to the OS which became a problem when multi-touch gestures became a thing in Windows.
For these devices, Microsoft produced its Windows Precision Touchpad specification and would handle the driver side of things, allowing vendors to have touchpads that shared a common driver and worked in Windows out of the box, as well as allow Microsoft to provide a better touchpad experience with gestures and palm rejection (but still not be able to rival what Apple does with the Broadcom touchpads on their MacBooks).
OpenBSD Support
In 2016, I finished
writing drivers
for these touchpads which required an
I2C controller driver
(dwiic
),
a HID-over-I2C driver
(ihidev
),
a basic I2C-HID mouse driver
(ims
),
then a full transport-agnostic HID driver implementing the
Windows Precision Touchpad spec
(hidmt
),
and finally an I2C touchpad driver
(imt
)
to interface between ihidev
and hidmt
.
Shortly after, some laptops started showing up with their keyboard connected
over I2C as well, requiring the
ikbd
driver.
In 2018, I wrote
umt
to support USB-connected Windows Precision devices in use on some laptop
touchscreens.
While all of this worked fairly well and somewhat modernized OpenBSD's non-ThinkPad laptop support (many ThinkPads up until some 2019 models still used a PS/2-connected touchpad and TrackPoint), there was one aspect that didn't work: on Broadwell chipsets, the touchpad would not wake up after an S3 suspend/resume.
Bug-Hunting
Later in 2016, I purchased a Chromebook Pixel and got
OpenBSD running on it.
The Pixel also had its touchpad and touchscreen connected over I2C,
though being a Chromebook not running Windows, its touchpad did not conform to
the Windows Precision Touchpad standard which meant it needed a new driver
(iatp
).
The Chromebook Pixel was also a Broadwell chipset and this new driver had the
same issue: communication with it failed after an S3 resume.
Two different vendors of touchpads and two different drivers, but the same
problem.
The I2C controller (dwiic
) worked fine after resume, but any time it tried to
communicate with the touchpad device, everything would just timeout.
After some months of debugging on Linux, I tracked down the fix to a single write to a register on the I2C controller device, found in Linux's Intel Low Power Subsystem (LPSS) driver for power gating.
Intel's LPSS is used for their I2C and SPI devices used to limit power usage
by quickly shutting off components when idle.
The way this is implemented in Linux is kind of confusing, and even now looking
at their main
LPSS driver
I can't see where the 0x800
register comes from that OpenBSD's driver writes
to the I2C controller in order to power up the I2C slave device.
That Linux driver registers a clock (clk
) device and the clk
framework
handles the register writing itself rather than calling back to a function in
the LPSS driver, which is why it took me so long to find it in 2016.
Interrupting ihidev
In 2017, I purchased a
Huawei Matebook X
with a Kaby Lake chipset which Intel refers to as the 100 Series.
Intel's I2C controllers on this chipset now show up as actual PCI devices, which
meant
splitting up
the dwiic
driver to handle both PCI and ACPI attachments.
The dwiic
driver fetches ACPI resource information for I2C slave devices that
are connected to it, like the touchpad.
That resource includes the I2C slave address and interrupt pin that it is
connected to on the
IOAPIC.
ihidev
then attaches and uses the standard methods in the OpenBSD kernel to
program the ioapic
device to register a callback to ihidev
whenever that pin
receives an interrupt.
Despite all of that being setup with the proper address and pin (which matched
what Linux did), the IOAPIC would never receive an interrupt on that pin
and ihidev
would never have its interrupt handler called when the touchpad was
touched.
It was being properly powered up and would respond to I2C HID commands, and
if polled after touching, there was finger data available to read.
It just never generated an interrupt.
As with the S3 resume issue, I spent months trying to figure out what was happening with these missing interrupts. I attended the OpenBSD t2k17 Hackathon and spent nearly a week straight in a room full of OpenBSD developers as I tried tearing apart the Linux I2C, LPSS, IOAPIC, and ACPI code with no luck.
As I heard reports from other users and developers with Intel 100 Series machines with the same interrupt problem, I started to assume it was specific to these newer chipsets. I went digging through Intel documentation and I2C implementations in other OSes (such as Coreboot and Google's Zircon kernel) to find anything related to this specific hardware.
Growing weary and admitting defeat, I added an
adaptive polling mechanism
to ihidev
so the kernel would poll the device every 200ms until there was
touch data available, then poll at 10ms until shortly after it stopped receiving
new data.
This was enough to get touchpads working on these new laptops, but it was slow
and wasted a bit of CPU time and battery power.
Unfortunately that "temporary" polling mechanism had to be used for the next two
years as no one could fix (or was not interested in fixing) this problem.
ACPI Node Walking
A few weeks ago, I purchased the 7th generation ThinkPad X1 Carbon. Getting OpenBSD installed and working on it has been quite a feat, as there were multiple bugs to fix. The first showstopper was a kernel panic shortly into booting the installer due to an AML problem with OpenBSD's AML parser reporting "Not Integer" when executing a particular method.
For some quick background: Linux and most smaller operating systems use an ACPI interpreter called ACPICA which is written and maintained by Intel. OpenBSD and Windows each use their own custom-developed ACPI stacks. Presumably Microsoft has many engineers available to maintain their ACPI implementation (since they also wrote and maintain the official ACPI specfication with Intel) and every other OS just re-imports the ACPICA code from Intel when it's updated. Unfortunately on OpenBSD, this means we have to fix bugs and implement new functionality required by the ACPI spec (now at version 6.3) when we encounter them on new hardware.
The cause of the "Not Integer" panic on the X1 was due to this AML in an _INI
method (ironically, on its touchpad device):
Method (_INI, 0, NotSerialized) // _INI: Initialize
{
GPDI = 0x64
If ((OSYS < 0x07DC))
{
SRXO (GPDI, One)
}
INT1 = GNUM (GPDI)
INT2 = INUM (GPDI)
[..]
If ((TPDT == 0x05))
{
If ((^^^LPCB.NFCD == Zero))
{
_HID = "SYNA8005"
}
Else
{
_HID = "SYNA8004"
}
ADBG (Concatenate ("TPD0 _HID:", ToHexString (_HID)))
HID2 = 0x20
BADR = 0x2C
ADBG (Concatenate ("TPD0 _INI:BADR=", ToHexString (BADR)))
Return (Zero)
}
When ACPI is being initialized in ACPICA or OpenBSD's ACPI code, it walks the
entire
DSDT
tree looking for any methods named _INI
and executes them.
This is how certain variables get initialized, interrupts get setup, and anything
else the hardware vendor needs to do.
At this point you may be thinking: maybe there's just an _INI
function that
OpenBSD is not executing that is needed to fix the touchpad interrupt problem.
I checked this a long time ago and listed out all of the _INI
method calls that
OpenBSD did and compared it to Linux.
The results were similar enough that I didn't investigate further.
The ToHexString
operator in that _INI
function is one built-in to ACPI and
is supposed to convert string or integer data into a string of hexadecimal
characters.
The way it was implemented in OpenBSD's AML parser
11 years ago
was to only accept integer arguments, so anything passed to it that wasn't an
integer (such as the _HID
string above) would cause an
AML panic.
After reviewing the ACPI specification, the
fix
was just to allow passing other types to the ToHexString
(and ToDecString
)
functions since the underlying OpenBSD implementations already handled
converting non-integer types.
However, while debugging that crash, I noticed something strange.
The first conditional in that _INI
method checks against OSYS
, which is a
global variable that most DSDTs compute according to which version of whichever
operating system it's running on.
There's a long history related to _OSI
that I won't go into, but basically
every OS
claims to be Windows
now, except on Apple hardware, where we all claim to be Darwin, because it's
easier for other OSes to behave like Windows and macOS than for the hardware
vendors to update their BIOS code when a driver issue in Linux is fixed.
--== Eval Method [\\_SB_.PCI0.I2C1.TPD0._INI, 0 args] to t ==--
===== Stack \\_SB_.PCI0.I2C1.TPD0._INI:Method
parsename: \\GPDI 5
write 00 6fb1847a 0020 [\\GNVS]
parsename: \\OSYS 5
read 00 6fb18000 0010 [\\GNVS]
aml_evalexpr: LLess 0 7dc = ffffffffffffffff
quick: 203a8 [LLess] alloc return integer = 0xffffffffffffffff
parse-if @ 203a6
parsename: \\_SB_.SRXO 8
The AML If ((OSYS < 0x07DC))
was being turned into a conditional LLess 0 7dc
,
but why was OSYS
zero?
Looking elsewhere in the DSDT, OSYS
is initialized like so:
Scope (_SB.PCI0)
{
[...]
Method (_INI, 0, Serialized) // _INI: Initialize
{
TBPE = One
OSYS = 0x03E8
If (CondRefOf (\_OSI))
{
If (_OSI ("Windows 2001"))
{
WNTF = One
WXPF = One
WSPV = Zero
OSYS = 0x07D1
}
If (_OSI ("Windows 2001 SP1"))
{
WSPV = One
OSYS = 0x07D1
}
[...]
If (_OSI ("Windows 2015"))
{
WIN8 = One
OSYS = 0x07DF
}
Basically for each newer version of Windows that the system reports it is
compatible with (OpenBSD reports up to Windows 2015), OSYS
is updated to a
higher value.
That OSYS
variable is then used in various other DSDT methods related to setting
up devices, basically to allow backwards compatibility if the machine is
being used with older versions of Windows that may not be able to deal with a
device set up in one particular way vs. another.
So if OSYS
is being initialized in _SB.PCI0._INI
, why is it zero when doing
the conditional in the touchpad's _INI
method?
Well as it turns out, the way that OpenBSD's ACPI stack was walking the entire
DSDT tree looking for _INI
methods was slightly different than ACPICA (and
presumably Windows).
On OpenBSD, nodes were being walked in this order;
\_SB_.PCI0.LPCB.EC0_._INI
\_SB_.PCI0.LPCB.EC0_.ALSD._INI
\_SB_.PCI0.XHC_._INI
[...]
\_SB_.PCI0.I2C1.TPL1._INI
\_SB_.PCI0._INI
But in ACPICA, they were walked in this order:
\_SB_.PCI0._INI
\_SB_.PCI0.LPCB.EC0_._INI
\_SB_.PCI0.LPCB.EC0_.ALSD._INI
\_SB_.PCI0.XHC_._INI
[...]
\_SB_.PCI0.I2C1.TPL1._INI
That slight change in ordering was the entire cause of the interrupt problem.
Earlier I wrote that I checked the list of _INI
calls in Linux vs. OpenBSD,
but I didn't realize the order of them was so important and that they had
interdependencies that weren't explicit.
When \_SB_.PCI0.I2C1.TPL1._INI
was executed first, OSYS
was still zero,
meaning that conditional mentioned earlier was returning true, executing
SRXO (GDPI, One)
.
After that, _SB_.PCI0._INI
was being executed, properly initializing OSYS
.
When ihidev
would attach later, it would call the touchpad device's _CRS
method to retrieve information about the I2C slave address and interrupt
information that was supposed to be setup earlier in its _INI
method:
Method (_CRS, 0, NotSerialized) // _CRS: Current Resource Settings
{
If ((OSYS < 0x07DC))
{
Return (SBFI) /* \_SB_.PCI0.I2C1.TPD0.SBFI */
}
[...]
By this time, OSYS
was properly set, and it would return resource information
saying that its interrupts were routing through the IOAPIC on a particular pin,
and OpenBSD would try to configure the IOAPIC accordingly.
However, that didn't match what the firmware was actually doing earlier when
_INI
was executed, because it was being told to route its interrupt through
some other mechanism or perhaps it never activated anything.
The
fix
for this was to change the node walk algorithm to match ACPICA and execute a
matching child node (_INI
) of a device before recursing through its child
devices.
With that change in place, it now properly executes \_SB_.PCI0._INI
before
_SB_.PCI0.I2C1.TPL1._INI
, ensuring that OSYS
is set before it's read.
With that fix in place, I was happy to finally
disable forced-polling
in dwiic
for ihidev
.
In the end, the bug had nothing to do with the devices being Intel 100 Series,
and was most likely affecting all of them similarly because their vendors all
used the same DSDT template from Intel, which uses OSYS
in device _INI
methods without an explicit dependency on _SB_.PCI0._INI
to initialize it.
These fixes are now in the OpenBSD tree and have been in recent snapshots, so if this bug affected you and you want to try it out with proper interrupts, try the most recent snapshot.