Debugging Techniques

Debugging Support in the Kernel

  • Configuration options in kernel
    • CONFIG_DEBUG_KERNEL
      • Just make other debugging options available.
    • CONFIG_DEBUG_SLAB
      • Set allocated memory to 0xa5.
      • Set freed memory to 0x6b.
    • CONFIG_DEBUG_PAGEALLOC
      • Full pages are removed from the kernel address space when freed.
    • CONFIG_DEBUG_SPINLOCK
      • The kernel catches operations on uninitialized spinlocks and various other errors (e.g., unlocking a lock twice).
    • CONFIG_DEBUG_SPINLOCK_SLEEP
      • This option enables a check for attempts to sleep while holding a spinlock.
    • CONFIG_INIT_DEBUG
      • This option enables checks for code that attempts to access initialization-time memory after initialization is complete.
    • CONFIG_DEBUG_INFO
      • Build the kernel with full debugging information included. (when use gdb)
      • You may also want to enable CONFIG_FRAME_POINTER if you plan to use gdb.
    • CONFIG_MAGIC_SYSRQ
      • Enables the “magic SysRq” key.
    • CONFIG_DEBUG_STACKOVERFLOW
    • CONFIG_DEBUG_STACK_USAGE
      • Help to track down kernel stack overflows.
    • CONFIG_KALLSYMS
      • Build kernel symbol information into the kernel; without it, an oops listing can be tracebacked only in hexadecimal.
    • CONFIG_IKCONFIG
    • CONFIG_IKCONFIG_PROC
      • Build full kernel configuration state into kernel and to be made available via /proc.
    • CONFIG_ACPI_DEBUG
      • Turns on verbose ACPI (Advanced Configuration and Power Interface) debugging information.
    • CONFIG_DEBUG_DRIVER
      • Turns on debugging information in the driver core.
    • CONFIG_SCSI_CONSTANTS
      • Builds in information for verbose SCSI error messages.
    • CONFIG_INPUT_EVBUG
      • Turns on verbose logging of input events.
      • It logs everything you type, including your passwords.
    • CONFIG_PROFILING
      • Profiling is normally used for system performance tuning.

Debugging by Printing

  • application programming : printf.
  • kernel code : printk.

printk

  • Logdevel
    • Defined in <linux/kernel.h>.
    • Smaller values representing higher priorities.
    • KERN_EMERG (0)
      • Used for emergency messages, usually those that precede a crash.
    • KERN_ALERT (1)
      • A situation requiring immediate action.
    • KERN_CRIT (2)
      • Critical conditions, often related to serious hardware or software failures.
    • KERN_ERR (3)
      • Used to report error conditions.
    • KERN_WARNING (4)
      • Warnings about problematic situations that do not, in themselves, create serious problems with the system.
    • KERN_NOTICE (5)
      • Situations that are normal, but still worthy of note.
      • Security-related conditions are reported at this level.
    • KERN_INFO (6)
      • Informational messages.
    • KERN_DEBUG (7)
      • Used for debugging messages.
  • dmesg command : read /proc/kmsg
  • console_loglevel
    • /proc/sys/kernel/printk
    • If it is set to 1, only messages of level 0 (KERN_EMERG) reach the console; if it is set to 8, all messages, including debugging ones, are displayed.

Redirecting Console Messages

  • By default, the “console” is the current virtual terminal.
  • Issue ioctl(TIOCLINUX) on any console device to receive messages from a different virtual terminal.
    int main(int argc, char **argv) {
      char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */
    
      if (argc==2) bytes[1] = atoi(argv[1]); /* the chosen console */
      else {
          fprintf(stderr, "%s: need a single arg\n", argv[0]); exit(1);
      }
      if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) { /* use stdin */
          fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
                  argv[0], strerror(errno));
          exit(1);
      }
      exit(0);
    }
    
  • TIOCLINUX can be found in drivers/char/tty_io.c.

How Messages Get Logged

  • Methods for monitoring messages from driver without ruins system log.
    • specify -f (file) option to klogd to save messages to a specific file.
    • customize /etc/syslog.conf.
    • (Brute-force) kill klogd, then print messages on an unused virtual terminal, or issue cat /proc/kmsg from an unused xterm.

Turning the Messages On and Off

#undef DBGMSG
#ifdef SCULL_DEBUG
#  ifdef __KERNEL__
     /* debug message in kernel space */
#    define DBGMSG(fmt, args...) printk(KERN_DEBUG "DBGMSG: " fmt, ## args)
#  else
	 /* debug message in user space */
#    define DBGMSG(fmt, args...) fprintf(stderr, fmt, ## args)
#  endif
#else
#  define DBGMSG(fmt, args...)
#endif

#undef DBGMSGG
#define DEBMSGG(fmt, args...) /* for diabling single message */
# Comment/uncomment the following line to disable/enable degugging
DEBUG = y

# Add your debugging flag (or not) to CFLAGS
ifeq ($(DEBUG),y)
	DEBFLAGS = -O -g -DSCULL_DEBUG # "-O" is needed to expand inlines
else
	DEBFLAGS = -O2
endif

EXTRA_CFLAGS += $(DEBFLAGS)

depend .depend dep:
	$(CC) $(EXTRA_CFLAGS) -M *.c > .depend

ifeq (.depend, $(wildcard .depend))
include .depend
endif
  • The art of good programming is in choosing the best trade-off between flexibility and efficiency.

Rate Limiting

  • When using a slow console device (e.g., a serial port), an excessive message rate can also slow down the system or just make it unresposive.
  • In general, production code should never print anything during normal operation; printed output should be an indication of an exceptional situation requiring attention.
  • In many cased, the best behavior is to set a flag saying, “I have already complained about this”, and not print any further messages once the flag gets set.
  • If you really need to print messages continuously, use int printk_ratelimit(void) to limit print frequency.
if (printk_ratelimit())
    printk(KERN_NOTICE "The printer is still on fire.\n");
  • The behavior of printk_ratelimit can be customized by modifying:
    • /proc/sys/kernel/printk_ratelimit
      • The number of second to wait before re-enabling messages.
    • /proc/sys/kernel/printk_ratelimit_burst
      • The number of messages accepted betore ratelimiting.

Printing Device Numbers

  • In the interest of consistency:
#include <linux/kdev_t.h>

// returns the number of characters printed.
int print_dev_t(char *buffer, dev_t dev);

// returns the buffer.
char *format_dev_t(char *buffer, dev_t dev);

Debugging by Querying

  • Disadvantages of using printk.
    • slow down the system
    • syslogd keeps syncing its output files, every line that is printed causes a disk operation.
  • The best way to get relevant information is to query the system when you need the information, instead of continually producing data. (e.g., ps, netstat, vmstat, etc)
  • A few techniques for driver developers to query the system:
    • creating a file in the /proc filesystem
    • using the ioctl driver method
    • exporting attributes via sysfs. (CH14)

Using the /proc Filesystem

  • The /proc filesystem is a special, software-created filesystem that is used by the kernel to export information to the world.
  • Each file under /proc is tied to a kernel function that generates the file’s “contents” when the file is read.
    • e.g., /proc/modules
      • returns a list of the currently loaded modules.
  • The /proc filesystem is dynamic.
  • Most of the time, /proc entries are read-only files.
  • Adding files under /proc is discouraged. The recommended way of making information available in new code is via sysfs.

Implementing files in /proc

  • should include <linux/proc_fs.h>
  • When a process reads from your /proc file, the kernel allocates a page of memory (i.e., PAGE_SIZE bytes) where the driver can write data to be returned to user space.
      // returns the number of bytes of data actually placed in the page buffer
      int (*read_proc)(char *page, char **start, off_t offset, int count, int *eof, void *data);
    
    • page pointer
      • the buffer where you’ll write your data.
    • start
      • used by the function to say where the interesting data has been written in page.
      • to help with the implementation of large (greater than one page) /proc files.
      • If leave it NULL, the kernel assumes that the data has been put into page as if offset were 0.
    • offset, count
      • same meaning as for the read method.
    • eof
      • points to an integer that must be set by the driver to signal that it has no more data to return.
    • data
      • driver-specific data pointer you can use for internal bookkeeping.
  • A better way to implement large /proc files is called seq_file.
    int scull_read_procmem(char *buf, char **start, off_t offset, int count, int *eof, void *data) {
      int i, j, len = 0;
      int limit = count - 80; /* Don't print more than this */
        
      for (i = 0; i < scull_nr_devs && len <= limit; i++) {
          struct scull_dev *d = &scull_devices[i];
          struct scull_qset *qs = d->data;
          if (down_interruptible(&d->sem))
              return -ERESTARTSYS;
          len += sprintf(buf+len, "\nDevice %i: qset %i, q %i, sz %li\n",
                  i, d->qset, d->quantum, d->size);
          for (; qs && len <= limit; qs = qs->next) { /* scan the list */
              len += sprintf(buf + len, "  item at %p, qset at %p\n",
                      qs, qs->data);
              if (qs->data && !qs->next) /* dump only the last item */
                  for (j = 0; j < d->qset; j++) {
                      if (qs->data[j])
                          len += sprintf(buf + len,
                                  "    % 4i: %8p\n",
                                  j, qs->data[j]);
                  }
          }
          up(&scull_devices[i].sem);
      }
      *eof = 1;
      return len;
    }
    

An older interface

int (*get_info)(char *page, char **start, off_t offset, int count);
  • same with read_proc, but eof and data args are missing.
  • Still supported, but it could go away in the future.

Creating your /proc file

  • Connect read_proc function to an entry in the /proc hierarchy.
      struct proc_dir_entry *create_proc_read_entry(
          const char *name, mode_t mode,
          struct proc_dir_entry *base,
          read_proc_t *read_proc, void *data
      );
    
    • name
      • the name of the file to create.
    • mode
      • the protection mask for the file.
      • can be passed as 0 for a system-wide default.
    • base
      • indicates the directory in which the file should be created.
      • if base is NULL, the file is created in the /proc root.
    • read_proc
      • the read_proc function that implements the file.
    • data
      • ignored by the kernel, but passed to read_proc.
    • example
        create_proc_read_entry(
        "scullmem",
        0,      /* default mode */
        NULL,   /* parent dir */
        scull_read_procmem,
        NULL    /* client data */
        );
      
  • Entries in /proc should be removed when the module is unloaded.
      remove_proc_entry("scullmem", NULL /* parent dir */);
    
  • Nuisances of using /proc files
    • Removal of /proc entries may well happen while the file is in use, as there is no owner associated to /proc entries.
    • Registering two entries with the same name.

The seq_file interface

  • Used to make life easier for the implementation of large files under /proc.
  • To use seq_file, you must create a simple “iterator” object that can establish a position within the sequence, step forward, and output one item in the sequence.
  • include <linux/seq_file.h>
  • start, next, stop, show functions.
  • start method is always called first.
      void *start(struct seq_file *sfile, loff_t *pos);
    
    • sfile
      • can almost always be ignored.
    • pos
      • integer position indicating where the reading should start.
    • the start method used in scull
        static void *scull_seq_start(struct seq_file *s, loff_t *pos) {
            if (scull_nr_devs <= *pos)
                return NULL; /* No more to read */
            return scull_devices + *pos;
        }
      
      • The return value, if non-NULL, is a private value that can be used by the iterator implementation.
  • next function should move the iterator to the next position.
      void *next(struct seq_file *sfile, void *v, loff_t *pos);
    
    • v
      • the iterator as returned from the previous call to start or next.
    • pos
      • the current position in the file.
    • next should increment the value pointed to by pos; depending on how your iterator works, you might (though probably won’t) want to increment pos by more than one.
    • the next method used in scull.
        static void *scull_seq_next(struct seq_file *s, void *, loff_t *pos) {
            (*pos)++;
            if (scull_nr_devs <= *pos)
                return NULL;
            return scull_devices + *pos;
        }
      
  • When the kernel is done with the iterator, it calls stop to clean up. ``` c void stop(struct seq_file *sfile, void *v);
  • The seq_file code does not sleep or perform other nonatomic tasks between start and stop by design.
    • It is safe for your start to acquire semaphores or spinlocks.
  • Between start and stop, kernel calls show to actually output something interesting to the user space.
      int show(struct seq_file *sfile, void *v);
    
    • It should not use printk.
    • Special functions for seq_file output:
      • int seq_printf(struct seq_file *sfile, const char *fmt, …);
        • Equivalent to printf in seq_file implementations.
      • int seq_putc(struct seq_file *sfile, char c);
      • int seq_puts(struct seq_file *sfile, const char *s);
        • Equivalent to user-sapce putc and puts functions.
      • int seq_escape(struct seq_file *m, const char *s, const char *esc);
        • Equivalent to seq_puts.
        • Any character in s that is also found in esc is printed in octal format.
        • common value for esc is “ \t\n\\”.
      • int seq_path(struct seq_file *sfile, struct vfsmount *m, struct dentry *dentry, char *esc);
        • Used for outputting the file name associated with a given directory entry.
        • Unlikely to be useful in device drivers.
    • the show method used in scull.
        static int scull_seq_show(struct seq_file *s, void *v) {
            struct scull_dev *dev = (struct scull_dev *)v;
            struct scull_qset *d;
            int i;
      
            if (down_interruptible(&dev->sem))
                return -ERESTARTSYS;
            seq_printf(s, "\nDevice %i: qset %i, q %i, sz %li\n",
                    (int)(dev - scull_devices), dev->qset,
                    dev->quantum, dev->size);
            for (d = dev->data; d; d = d->next) { /* scan the list */
                seq_printf(s, "  item at %p, qset at %p\n", d, d->data);
                if (d->data && !d->next) /* dump only the last item */
                    for (i = 0; i < dev->qset; i++) {
                        if (d->data[i])
                            seq_printf(s, "    % 4i: %8p\n",
                                    i, d->data[i]);
                    }
            }
            up(&dev->sem);
            return 0;
        }
      
  • Packge the functions up and connect them to a file in /proc.
      static struct seq_operations scull_seq_ops = {
          .start = scull_seq_start,
          .next  = scull_seq_next,
          .stop  = scull_seq_stop,
          .show  = scull_seq_show
      };
    
      static int scull_proc_open(struct inode *inode, struct file *file) {
          return seq_open(file, &scull_seq_ops);
      }
        
      static structure file_operations scull_proc_ops = {
          .owner   = THIS_MODULE,
          .open    = scull_proc_open,
          .read    = seq_read,
          .llseek  = seq_llseek,
          .release = seq_release
      };
    
      /* lower-level
          struct proc_dir_entry *create_proc_entry(
          const char *name, modt_t mode, struct proc_dir_entry *parent);
      */
      entry = create_proc_entry("scullseq", 0, NULL);
      if (entry)
          entry->proc_fops = &scull_proc_ops;
    

The ioctl Method

  • Need another program to issue the ioctl and display the results.
  • It runs faster than reading /proc.
  • Undocumented ioctl commands are likely to remain unnoticed.

Debugging by Watching

  • User-space program
    • run a debugger on it to step through its functions.
    • add print statements.
    • run the program under strace.
  • strace command.
    • shows all the system calls issued by a user-space program, including calls, arguments and return values.
    • command-line options
      • -t
        • display the time when each call is executed.
      • -T
        • display the time spent in the call.
      • -e
        • limit the types of calls traced.
      • -o
        • redirect the output to a file.
        • defualt: stderr.

Debugging System Faults

  • “fault” doesn’t mean “panic”.
  • A fault usually results in the destruction of the current process while the system goes on working.
  • The system can panic, and it may if a fault happens outside of a process’s context or if some vital part of the system is compromised.
  • A driver error, it usually results only in the sudden death of the process using the driver.
    • Unrecoverable damage
      • when a process is destroyed, some memory allocated to the process’s context is lost. (e.g., kmalloc)
      • since the kernel calls the close operation for any open device when a process dies, your driver can release what was allocated by the open method.
  • A buggy driver can
    • leave hardware in an unusable state.
    • leave kernel resources in an inconsistent state.
    • corrupt kernel memory in random places.

Oops Messages

  • Almost any address used by the processor is a virtual address and is mapped to physical addresses through a complex structure of page tables.
  • When an invalid pointer is dereferenced, the paging mechanism fails to map the pointer to a physical address, and the processor signals a page fault to the operating systemf
  • If the address is nor valid, the kernel is not able to “page in” the missing address, it usually generates an oops if this happens while the processor is in supervisor mode.
  • An oops displays the processor status at the time of the fault.
    • contents of the CPU registers
    • other seemingly incomprehensible information.
  • fault module
    • used to demonstrate failures.
  • When you are confronted with an oops, the first thing to do is to look at the location where the problem happened.
  • On the x86 architecture, by default, the user-space stack starts just below 0xc0000000.
  • Always be on the lookout for the “slab poisoning”.
    • 0xa5a5a5a5 : forget to initialize dynamic memory somewhere.
  • can see a symbolic call stack only if kernel is built with the CONFIG_KALLSYMS option turned on.

System Hangs

  • deal with system hangs
    • prevent them beforehand
      • insert schedule invocations at strategic points.
        • not to call schedule any time that your driver is holding a spinlock.
    • be able to debug them after the fact.
      • add some print messages.
  • Sometimes the system may appear to be hung, but it isn’t.
  • magic SysRq key (Alt + SysRq + ?)
    • r
      • Turns off keyboard raw mode.
    • k
      • Invokes the “secure attention key” (SAK) function to kills all processes running on the current console.
    • s
      • Performs an emergency synchronization of all disks.
    • u
      • Umount. Attempts to remount all disks in a read-only mode.
    • b
      • Boot. Immediately reboots the systemf
      • Be sure to synchronize and remount the disks first.
    • p
      • Prints processor registers information.
    • t
      • Prints the current task list.
    • m
      • Prints memory information.
  • Magic SysRq must be explicitly enabled in the kernel configuration.
    • can be disabled at runtime with : echo 0 > /proc/sys/kernel/sysrq
    • can be enabled at runtime with : echo 1 > /proc/sys/kernel/sysrq
  • Trigger a specific sysrq action
    • echo ${cmd} > /proc/sysrq-trigger.
  • When a “live hang” occurs (driver is stuck in a loop but the system as a whole is still functioning)
    • SysRq p.
    • Kernel profiling function.
      • boot with profile=2
      • Reset the profile counters with the readprofile utility.
      • send driver into its loop.
      • Use readprofile again to see where the kernel is spending its time.
      • Another more advanced alternative is oprofile.
  • Precaution when chasing system hangs
    • mount all disks as read-only (or unmount them)
      • no risk of damaging the filesystem.
      • no risk of leaving it in an inconsistent state.
    • mount its filesystem via NFS.
  • This approach is time-consuming and should be avoided whenever possible.

Using gdb

  • Proficient use of the debugger at this level requires
    • confidence with gdb commands.
    • understanding of assembly code for the target platform.
    • ability to match source code and optimized assembly.
  • typical invocation of gdb
      gdb /usr/src/linux/vmlinux /proc/kcore
    
    • first argument is the name of the uncompressed ELF kernel executable.
    • second argument is the name of the core file.
  • gdb is not able to
    • modify kernel data.
    • set breakpoints
    • set watchpoints.
    • single-step through kernel functions.
  • symbol information for gdb
    • compile kernel with CONFIG_DEBUG_INFO- set.
    • the kernel image will be far larger.
  • Linux loadable modules are ELF-format executable images.
  • A typical module contains a dozen or more sections, but there are typically three that are relevant in a debuggin session.
    • .text
      • This section contains the executable code for the module.
    • .bss
    • .data
      • These two sections hold the module’s variables.
      • Any variable that is not initialized at compile time ends up in .bss.
      • Any variable that is initialized at compile time ends up in .data.
  • Module sections is available in /sys/module.

The kdb Kernel Debugger

  • There’s no built-in kernel debugger in linux originally.
  • Need to apply config, rebuild and reinstall kernel to active kdb.
  • when kdb is invoked :
    • nothing else should be running on system.
    • even the network.
    • single-user mode is a good idea.
  • bp <funcion>
    • breakpoint, stop at <function> next time.
  • go
    • continue running.
  • bt
    • backtrace from the stack of current process.
  • mds <var_name> <# words>
    • query # words of data starting at the location of var_name.
  • mds <addr>
    • query data at addr.
  • mm <addr> <value>
    • modify addr content to value.
  • other capabilities
    • single-stepping by instructions.
    • setting breakpoint on data access.
    • disassembling code
    • stepping through linked lists.
    • accessing register data.
    • etc.
    • manual : Documentation/kdb in kernel source tree.

The kgdb Patches

  • there are two versions of kgdb.
  • first kgdb
    • can be found at -mm kernel tree.
    • supported architecture
      • x86
      • SuperH
      • ia64
      • x86_64
      • SPARC
      • 32-bit PPC
    • communication
      • serial port
      • local-area network
        • enable ethernet mode
        • boot with the kgdboe set to indicate the IP addr from which debugging commands can originate.
    • Documentation/i386/kgdb
  • second kgdb
    • http://kgdb.sf.net/
    • not support network communication mode (under development)
    • built-in support for working with loadable modules.
    • supported architecture
      • x86
      • x86_64
      • PowerPC
      • S/390

The User-Mode Linux (UML) Port

  • runs on a virtual machine implemented on the Linux system call interface.
  • the user-mode kernel can be easily manipulated with gdb or anorther debugger.
  • bit shortcoming from the point of view of driver writers
    • the user-mode kernel has no access to the host system’s hardware.
    • UML is not yet useful for debugging drivers that have to deal with real hardware.
  • http://user-mode-linux.sf.net/

The Linux Trace Toolkit (LTT)

  • is a kernel patch, a set of related utilities that allow the tracing of events in the kernel.
    • timing information
    • create a reasonably complete picture of what happened over a given period of time.
  • used for debugging and tracking down performance problems.
  • http://www.opersys.com/LTT

Dynamic Probes (DProbes)

  • debugging tool released (under the GPL) by IBM for Linux on the IA-32 architecture.
  • can “probe” at almost any place in the system, in both user and kernel space.
  • when “the probe” is invoked, it can report information back to user space, change registers or do a number of other things.
  • after it has been built into the kernel, probes can be inserted anywhere within a running system without kernel builds or reboots.
  • it can also work with the LTT to insert new tracing events at arbitrary locations.
  • http://oss.software.ibm.com