Moin,
ich habe den Kernel mit den Debug-Ausgaben versehen, erfolgreich kompilieren und starten können. Nach ewigen Zappen habe ich dann doch irgendwann einen Kernel-Oops auslösen können. Ich habe xhci-ring.c wie folgt modifiziert:
// Bis zu dieser Funktion ist alles unverändert
void xhci_find_new_dequeue_state(struct xhci_hcd *xhci,
unsigned int slot_id, unsigned int ep_index,
unsigned int stream_id, struct xhci_td *cur_td,
struct xhci_dequeue_state *state)
{
struct xhci_virt_device *dev = xhci->devs[slot_id];
struct xhci_ring *ep_ring;
struct xhci_generic_trb *trb;
struct xhci_ep_ctx *ep_ctx;
dma_addr_t addr;
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
ep_ring = xhci_triad_to_transfer_ring(xhci, slot_id,
ep_index, stream_id);
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
if (!ep_ring) {
xhci_warn(xhci, "WARN can't find new dequeue state "
"for invalid stream ID %u.\n",
stream_id);
return;
}
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
state->new_cycle_state = 0;
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
"Finding segment containing stopped TRB.");
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
state->new_deq_seg = find_trb_seg(cur_td->start_seg,
dev->eps[ep_index].stopped_trb,
&state->new_cycle_state);
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
if (!state->new_deq_seg) {
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
WARN_ON(1);
printk(KERN_INFO"%s %s %d\n", __FILE__, __func__, __LINE__);
return;
}
// Der Rest ist unverändert geblieben
Syslog hat folgendes ausgegeben:
Jul 16 21:50:45 mc kernel: [ 2038.107685] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 559
Jul 16 21:50:45 mc kernel: [ 2038.107690] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 562
Jul 16 21:50:45 mc kernel: [ 2038.107692] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 569
Jul 16 21:50:45 mc kernel: [ 2038.107694] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 571
Jul 16 21:50:45 mc kernel: [ 2038.107696] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 574
Jul 16 21:50:45 mc kernel: [ 2038.107698] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 578
Jul 16 21:50:45 mc kernel: [ 2038.108440] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 559
Jul 16 21:50:45 mc kernel: [ 2038.108443] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 562
Jul 16 21:50:45 mc kernel: [ 2038.108445] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 569
Jul 16 21:50:45 mc kernel: [ 2038.108447] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 571
Jul 16 21:50:45 mc kernel: [ 2038.108448] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 574
Jul 16 21:50:45 mc kernel: [ 2038.108451] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 578
Jul 16 21:50:45 mc kernel: [ 2038.230230] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 559
Jul 16 21:50:45 mc kernel: [ 2038.230238] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 562
Jul 16 21:50:45 mc kernel: [ 2038.230242] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 569
Jul 16 21:50:45 mc kernel: [ 2038.230245] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 571
Jul 16 21:50:45 mc kernel: [ 2038.230248] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 574
Jul 16 21:50:45 mc kernel: [ 2038.230252] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 578
Jul 16 21:50:45 mc kernel: [ 2038.230256] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 580
Jul 16 21:50:45 mc kernel: [ 2038.230259] ------------[ cut here ]------------
Jul 16 21:50:45 mc kernel: [ 2038.230271] WARNING: CPU: 2 PID: 4015 at drivers/usb/host/xhci-ring.c:581 xhci_find_new_dequeue_state+0x3b9/0x3e0()
Jul 16 21:50:45 mc kernel: [ 2038.230274] Modules linked in: pci_stub(F) vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) bridge(F) stp(F) llc(F) joydev(F) snd_oxygen(F) snd_oxygen_lib(F) snd_mpu401_uart(F) snd_pcm(F) snd_page_alloc(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F) x86_pkg_temp_thermal(F) intel_powerclamp(F) snd_seq_device(F) coretemp(F) snd_timer(F) kvm_intel(F) kvm(F) crct10dif_pclmul(F) crc32_pclmul(F) ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) lrw(F) gf128mul(F) glue_helper(F) ablk_helper(F) cryptd(F) snd(F) serio_raw(F) i915(F) drm_kms_helper(F) soundcore(F) shpchp(F) bnep(F) rfcomm(F) bluetooth(F) fujitsu_laptop(F) parport_pc(F) drm(F) i2c_algo_bit(F) mei_me(F) mei(F) ppdev(F) lpc_ich(F) video(F) tpm_infineon(F) mac_hid(F) lp(F) parport(F) binfmt_misc(F) hid_sunplus(F) psmouse(F) ahci(F) libahci(F) e1000e(F) ptp(F) pps_core(F) usbhid(F) hid(F)
Jul 16 21:50:45 mc kernel: [ 2038.230385] CPU: 2 PID: 4015 Comm: mediasrv Tainted: GF O 3.13.11.2 #1
Jul 16 21:50:45 mc kernel: [ 2038.230389] Hardware name: FUJITSU D3222-B1/D3222-B1, BIOS V4.6.5.4 R1.23.0 for D3222-B1x 05/16/2014
Jul 16 21:50:45 mc kernel: [ 2038.230393] 0000000000000009 ffff8803d2aebc40 ffffffff8171969b 0000000000000000
Jul 16 21:50:45 mc kernel: [ 2038.230403] ffff8803d2aebc78 ffffffff8106757d ffff880403fc8000 0000000000000002
Jul 16 21:50:45 mc kernel: [ 2038.230411] ffff88040595ed00 0000000000000002 ffff8803d2aebce8 ffff8803d2aebc88
Jul 16 21:50:45 mc kernel: [ 2038.230420] Call Trace:
Jul 16 21:50:45 mc kernel: [ 2038.230432] [<ffffffff8171969b>] dump_stack+0x45/0x56
Jul 16 21:50:45 mc kernel: [ 2038.230440] [<ffffffff8106757d>] warn_slowpath_common+0x7d/0xa0
Jul 16 21:50:45 mc kernel: [ 2038.230446] [<ffffffff8106765a>] warn_slowpath_null+0x1a/0x20
Jul 16 21:50:45 mc kernel: [ 2038.230454] [<ffffffff81576af9>] xhci_find_new_dequeue_state+0x3b9/0x3e0
Jul 16 21:50:45 mc kernel: [ 2038.230461] [<ffffffff8156e4fc>] xhci_cleanup_stalled_ring+0x6c/0x200
Jul 16 21:50:45 mc kernel: [ 2038.230469] [<ffffffff81575c3a>] ? queue_command+0x7a/0xd0
Jul 16 21:50:45 mc kernel: [ 2038.230476] [<ffffffff8156e7be>] xhci_endpoint_reset+0x12e/0x1a0
Jul 16 21:50:45 mc kernel: [ 2038.230484] [<ffffffff815419b5>] usb_hcd_reset_endpoint+0x25/0x70
Jul 16 21:50:45 mc kernel: [ 2038.230491] [<ffffffff815449b5>] usb_enable_endpoint+0x85/0x90
Jul 16 21:50:45 mc kernel: [ 2038.230496] [<ffffffff81544a02>] usb_enable_interface+0x42/0x60
Jul 16 21:50:45 mc kernel: [ 2038.230500] [<ffffffff81544c2f>] usb_set_interface+0x20f/0x360
Jul 16 21:50:45 mc kernel: [ 2038.230506] [<ffffffff8154ee2b>] usbdev_do_ioctl+0xa0b/0x1050
Jul 16 21:50:45 mc kernel: [ 2038.230510] [<ffffffff8154f49e>] usbdev_ioctl+0xe/0x20
Jul 16 21:50:45 mc kernel: [ 2038.230517] [<ffffffff811cfa00>] do_vfs_ioctl+0x2e0/0x4c0
Jul 16 21:50:45 mc kernel: [ 2038.230522] [<ffffffff8109db54>] ? vtime_account_user+0x54/0x60
Jul 16 21:50:45 mc kernel: [ 2038.230528] [<ffffffff811cfc61>] SyS_ioctl+0x81/0xa0
Jul 16 21:50:45 mc kernel: [ 2038.230533] [<ffffffff8172a2bf>] tracesys+0xe1/0xe6
Jul 16 21:50:45 mc kernel: [ 2038.230536] ---[ end trace 55332440026aa7a2 ]---
Jul 16 21:50:45 mc kernel: [ 2038.230539] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 582
Jul 16 21:50:45 mc kernel: [ 2038.230551] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
Jul 16 21:50:45 mc kernel: [ 2038.230619] IP: [<ffffffff81576b9d>] xhci_queue_new_dequeue_state+0x7d/0x1e0
Jul 16 21:50:45 mc kernel: [ 2038.230676] PGD 3e0be9067 PUD 3de6d8067 PMD 0
Jul 16 21:50:45 mc kernel: [ 2038.230718] Oops: 0000 [#1] SMP
Jul 16 21:50:45 mc kernel: [ 2038.230748] Modules linked in: pci_stub(F) vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) bridge(F) stp(F) llc(F) joydev(F) snd_oxygen(F) snd_oxygen_lib(F) snd_mpu401_uart(F) snd_pcm(F) snd_page_alloc(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) snd_seq(F) x86_pkg_temp_thermal(F) intel_powerclamp(F) snd_seq_device(F) coretemp(F) snd_timer(F) kvm_intel(F) kvm(F) crct10dif_pclmul(F) crc32_pclmul(F) ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) lrw(F) gf128mul(F) glue_helper(F) ablk_helper(F) cryptd(F) snd(F) serio_raw(F) i915(F) drm_kms_helper(F) soundcore(F) shpchp(F) bnep(F) rfcomm(F) bluetooth(F) fujitsu_laptop(F) parport_pc(F) drm(F) i2c_algo_bit(F) mei_me(F) mei(F) ppdev(F) lpc_ich(F) video(F) tpm_infineon(F) mac_hid(F) lp(F) parport(F) binfmt_misc(F) hid_sunplus(F) psmouse(F) ahci(F) libahci(F) e1000e(F) ptp(F) pps_core(F) usbhid(F) hid(F)
Jul 16 21:50:45 mc kernel: [ 2038.231452] CPU: 2 PID: 4015 Comm: mediasrv Tainted: GF W O 3.13.11.2 #1
Jul 16 21:50:45 mc kernel: [ 2038.231530] Hardware name: FUJITSU D3222-B1/D3222-B1, BIOS V4.6.5.4 R1.23.0 for D3222-B1x 05/16/2014
Jul 16 21:50:45 mc kernel: [ 2038.231596] task: ffff8803d28a8000 ti: ffff8803d2aea000 task.ti: ffff8803d2aea000
Jul 16 21:50:45 mc kernel: [ 2038.231649] RIP: 0010:[<ffffffff81576b9d>] [<ffffffff81576b9d>] xhci_queue_new_dequeue_state+0x7d/0x1e0
Jul 16 21:50:45 mc kernel: [ 2038.231723] RSP: 0018:ffff8803d2aebc70 EFLAGS: 00010046
Jul 16 21:50:45 mc kernel: [ 2038.231762] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Jul 16 21:50:45 mc kernel: [ 2038.231813] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880403fc8000
Jul 16 21:50:45 mc kernel: [ 2038.231863] RBP: ffff8803d2aebcd0 R08: ffff8803d2aebce8 R09: ffff880403fc8000
Jul 16 21:50:45 mc kernel: [ 2038.231913] R10: 0000000000000000 R11: ffff8803d2aeb9c6 R12: ffff8804038101f8
Jul 16 21:50:45 mc kernel: [ 2038.231964] R13: 0000000000000002 R14: ffff8803d2aebce8 R15: ffff880403fc8000
Jul 16 21:50:45 mc kernel: [ 2038.232015] FS: 00007f0315399700(0000) GS:ffff88041e300000(0000) knlGS:0000000000000000
Jul 16 21:50:45 mc kernel: [ 2038.232073] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 16 21:50:45 mc kernel: [ 2038.232114] CR2: 0000000000000010 CR3: 00000003e0be8000 CR4: 00000000001407e0
Jul 16 21:50:45 mc kernel: [ 2038.232165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 16 21:50:45 mc kernel: [ 2038.232215] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 16 21:50:45 mc kernel: [ 2038.232265] Stack:
Jul 16 21:50:45 mc kernel: [ 2038.232282] ffff8803d2aebc78 0000000000000018 ffff8803d2aebce0 ffff8803d2aebc90
Jul 16 21:50:45 mc kernel: [ 2038.232344] ffffffff81576b1a ffff880300000000 ffff880400000002 ffff880403fc8000
Jul 16 21:50:45 mc kernel: [ 2038.232406] 0000000000000002 ffff8804038101f8 ffff8804079a1000 ffff8804038101f8
Jul 16 21:50:45 mc kernel: [ 2038.232468] Call Trace:
Jul 16 21:50:45 mc kernel: [ 2038.232491] [<ffffffff81576b1a>] ? xhci_find_new_dequeue_state+0x3da/0x3e0
Jul 16 21:50:45 mc kernel: [ 2038.232543] [<ffffffff8156e652>] xhci_cleanup_stalled_ring+0x1c2/0x200
Jul 16 21:50:45 mc kernel: [ 2038.232596] [<ffffffff81575c3a>] ? queue_command+0x7a/0xd0
Jul 16 21:50:45 mc kernel: [ 2038.232639] [<ffffffff8156e7be>] xhci_endpoint_reset+0x12e/0x1a0
Jul 16 21:50:45 mc kernel: [ 2038.232685] [<ffffffff815419b5>] usb_hcd_reset_endpoint+0x25/0x70
Jul 16 21:50:45 mc kernel: [ 2038.232731] [<ffffffff815449b5>] usb_enable_endpoint+0x85/0x90
Jul 16 21:50:45 mc kernel: [ 2038.232776] [<ffffffff81544a02>] usb_enable_interface+0x42/0x60
Jul 16 21:50:45 mc kernel: [ 2038.232822] [<ffffffff81544c2f>] usb_set_interface+0x20f/0x360
Jul 16 21:50:45 mc kernel: [ 2038.232866] [<ffffffff8154ee2b>] usbdev_do_ioctl+0xa0b/0x1050
Jul 16 21:50:45 mc kernel: [ 2038.232911] [<ffffffff8154f49e>] usbdev_ioctl+0xe/0x20
Jul 16 21:50:45 mc kernel: [ 2038.232951] [<ffffffff811cfa00>] do_vfs_ioctl+0x2e0/0x4c0
Jul 16 21:50:45 mc kernel: [ 2038.232993] [<ffffffff8109db54>] ? vtime_account_user+0x54/0x60
Jul 16 21:50:45 mc kernel: [ 2038.233039] [<ffffffff811cfc61>] SyS_ioctl+0x81/0xa0
Jul 16 21:50:45 mc kernel: [ 2038.233078] [<ffffffff8172a2bf>] tracesys+0xe1/0xe6
Jul 16 21:50:45 mc kernel: [ 2038.233113] Code: 00 48 85 c9 0f 84 74 01 00 00 4c 89 ce 48 89 cf 89 55 b4 4c 89 4d b8 48 89 4d c0 e8 4e f6 ff ff 48 8b 4d c0 4c 8b 4d b8 8b 55 b4 <4c> 8b 41 10 48 c7 c6 20 5f 57 81 89 54 24 08 48 89 04 24 48 c7
Jul 16 21:50:45 mc kernel: [ 2038.233396] RIP [<ffffffff81576b9d>] xhci_queue_new_dequeue_state+0x7d/0x1e0
Jul 16 21:50:45 mc kernel: [ 2038.233452] RSP <ffff8803d2aebc70>
Jul 16 21:50:45 mc kernel: [ 2038.235481] CR2: 0000000000000010
Jul 16 21:50:45 mc kernel: [ 2038.248541] ---[ end trace 55332440026aa7a3 ]---
Zur Analyse der Ausgabe habe folgende Doku genommen:
http://www.opensourceforu.com/2011/01/understanding-a-kernel-oops/.
Wie schon von Seiten Sundtek korrekt erklärt, handelt es sich um einen Kernel-Bug, was spätestens aus der Zeile
Jul 16 21:50:45 mc kernel: [ 2038.230718] Oops: 0000 [#1] SMP
hervorgeht. Umso mehr spricht es für den Support von Sundtek, dass sie trotzdem mithelfen, dieses Problem zu lösen. Klasse!
Soweit ich den Trace verstanden habe, wird die Ausgabe von "Jul 16 21:50:45 mc kernel: [ 2038.230259]" bis "Jul 16 21:50:45 mc kernel: [ 2038.230536]" durch die Zeile
WARN_ON(1);
in der Funktion xhci_find_new_dequeue_state verursacht. Dieses ist aber wohl nicht ursächlich für den Kernel-Oops, da sonst nicht die Ausgabe
Jul 16 21:50:45 mc kernel: [ 2038.230539] drivers/usb/host/xhci-ring.c xhci_find_new_dequeue_state 582
möglich wäre. Der Kernel-Oops beginnt ab "Jul 16 21:50:45 mc kernel: [ 2038.230551]". Die Zeile
Jul 16 21:50:45 mc kernel: [ 2038.230619] IP: [<ffffffff81576b9d>] xhci_queue_new_dequeue_state+0x7d/0x1e0
gibt Aufschluss, wo der Instruction Pointer auf den Null Pointer Fehler gestoßen ist. Demnach ist also ein Fehler in der Funktion "xhci_queue_new_dequeue_state". Ich habe mir mit
(gdb) disassemble xhci_queue_new_dequeue_state
den Offset geholt:
Dump of assembler code for function xhci_queue_new_dequeue_state:
0x00000000000016a0 <+0>: callq 0x16a5 <xhci_queue_new_dequeue_state+5>
Nun noch ein bisschen gerechnet: 0x7d+0x16a0=0x171d und durch
(gdb) list *0x171d
mir den fehlerhaften Code ausgeben lassen:
0x171d is in xhci_queue_new_dequeue_state (drivers/usb/host/xhci-ring.c:702).
697 unsigned int stream_id,
698 struct xhci_dequeue_state *deq_state)
699 {
700 struct xhci_virt_ep *ep = &xhci->devs[slot_id]->eps[ep_index];
701
702 xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
703 "Set TR Deq Ptr cmd, new deq seg = %p (0x%llx dma), "
704 "new deq ptr = %p (0x%llx dma), new cycle = %u",
705 deq_state->new_deq_seg,
706 (unsigned long long)deq_state->new_deq_seg->dma,
Demnach ist also Zeile 702 defekt. Da ich allerdings nicht C programmiere, bräuchte ich ein bisschen Nachhilfe, wie ich prüfe, welcher Parameter von xhci_dbg_trace null ist.