Char Drivers

  • Take scull (Simple Character Utility for Loading Localities) for example.
  • scull is not hardware dependent, it is a device which acts on some memory allocated from the kernel.
  • scull is used to demonstrate the interface between the kernel and char drivers and allow the user to run some tests.

The Design of scull

  • The first step of driver writing is definning the capabilities (the mechanism) the driver will offer to user programs.
  • To make scull useful as a template for writing real drivers for real devices, the scull source implements the following device types:
    • scull0 to scull3
      • Each consisting of a memory area that is both global and persistent.
      • Global means that if the device is opened multiple times, the data contained within the device is shared by all the file descriptors that opened it.
      • Persistent means that if the device is closed and reopened, data isn’t lost.
      • Use cp, cat, echo, shell I/O redirection to play with it.
    • scullpipe0 to scullpipe3
      • FIFO (first-in-first-out) devices.
      • If multiple processes read the same device, they contend for data.
      • Show how blocking and non-blocking read and write can be implemented without having to resort to interrupts.
    • scullsingle
      • Similar to scull0 but it allows only one process at a time to use the driver.
    • scullpriv
      • Similar to scull0 but it is private to each virtual console (or X terminal session). (processes on each console/terminal get different memory areas)
    • sculluid
      • Similar to scull0.
      • It can be opened multiple times, but only one user at a time.
      • Returns an error of “Device Busy” if another user is locking the device.
    • scullwuid
      • Similar to scull0.
      • It can be opened multiple times, but only one user at a time.
      • Implemented with blocking open if another user is locking the device.
  • This chapter covers the internals of scull0 to scull3, the more advanced devices are covered in Chapter 6. (Basic part as below)
    • open
    • release
    • read
    • write

Major and Minor Numbers

  • char devices are accessed through names in the filesystem.
    • Usually in /dev/
    • identified by “c” in the first column of the output of ls -l.
    • major number identifies the driver associated with the device. (one-major-one-driver principle)
    • minor number is used by the kernel to determine exactly which device is being referred to.

The Internal Representation of Device Numbers

#include <linux/types.h> // dev_t
#include <linux/kdev_t.h> // macros
dev_t dev_num;

// make dev_t with MKDEV(int major, int minor) macro
dev_num = MKDEV(1, 6);

// get MAJ with MAJOR(dev_t dev) macro
int maj = MAJOR(dev_num);

// get MIN with MINOR(dev_t dev) macro
int min = MINOR(dev_num);

Allocating and Freeing Device Numbers

  • One of the first things your driver will need to do when setting up a char device is to obtain one or more device numbers to work with.
      #include <linux/fs.h>
    
      // Static approach
      int register_chrdev_regio(dev_t first, unsigned int count, char *name)
    
    • dev_t first
      • the beginning device number of the range you would like to allocate.
      • The minor number portion is often 0.
    • unsigned int count
      • the total number of contiguous device number you are requesting.
    • char *name
      • the name of the device that should be associated with this number range.
      • it will appear in /proc/devices and sysfs.
    • return 0 or negative error code.
      // Dynamic approach (recommended)
      int alloc_chrdev_region(dev_t *dev, unsigned int firstminor, unsigned int count, char *name);
    
    • dev_t *dev
      • output-only.
      • hold the first number in your allocated range if successed.
    • unsigned int firstminor
      • the requested first minor number to use. (usually 0)
    • unsigned int count
      • same as register_chrdev_region.
    • char *name
      • same as register_chrdev_region.
  • You should free them when they are no longer in use.
      void unregister_chrdev_region(dev_t first, unsigned int count);
    

Dynamic Allocation of Major Numbers

  • Strongly suggest that you use dynamic allocation to obtain major device number, rather than choosing a randomly free number.
  • You can get your dynamic major number from /proc/devices after calling insmod.

Some Important Data Structures

  • three important kernel data structures.
    • file_operations
    • file
    • inode

File Operations

  • The file_operations structure is how a char driver executes a specific operation.
      struct file_operations scull_fops = {
          .owner =    THIS_MODULE,
          .open =     scull_open,     /* self defined function*/
          .release =  scull_release,  /* self defined function*/
          .read =     scull_read,     /* self defined function*/
          .write =    scull_write,    /* self defined function*/
      }
    
    • struct module *owner
      • A pointer to the module that “owns” the structure.
      • Initialized to THIS_MODULE almost all the time.
    • int (*open) (struct inode *, struct file *);
      • The first operation performed on the device file.
    • int (*release) (struct inode *, struct file *);
      • Invoked when the file structure is being released.
    • ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
      • Used to retrieve data from the device.
    • ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
      • Sens data to the device.
    • Standard C tagged structure initialization syntax.
      • makes driver more portable across changes in the definitions of the structures.
      • makes the code more compact and readable.

The file Structure

  • Has nothing to do with C library FILE pointer.
  • The file structure represents an openfile.
  • We’ll consistently call the pointer filp to prevent ambiguities with the structure itself.
  • Important fields of struct file:
    • mode_t;
      • It identifies the file as either readable or writable (or both).
    • loff_t f_pos;
      • The current reading or writing position.
    • unsigned int f_flags;
      • file_flags, such as O_RDONLY, O_NONBLOCK, and O_SYNC.
      • <linux/fcntl.h>
    • struct file_operations *f_op;
      • The operations associated with the file.
    • void *private_data;
      • It is a useful resource for preserving state information across system calls.
      • You must remember to free that memory in the release method before the file structure is destroyed by the kernel.
    • Drivers never create file structures, they only access structures created elsewhere.

The inode Structure

  • The inode structure is used by the kernel internally to represent files. There can be numerous file structures representing multiple open descriptors on a single file, but they all point to a single inode structure.
  • Two fields of this structure are of interest for writing driver code:
    • dev_t i_rdev;
      • For inodes that represent device files, this field contains the actual device number.
    • struct cdev *i_cdev;
      • It is the kernel’s internal structure that represents char devices.
    • Instead of manipulating i_rdev directly, you can obtain the major and minor number from an inode by:
        unsigned int iminor(struct inode *inode);
        unsigned int imajor(struct inode *inode);
      

Char Device Registration

  • Two ways of allocating and initializaing.
      #include <linux/cdev.h>
        
      struct cdev *my_cdev = cdev_alloc();
      my_cdev->ops = &my_fops;
    
      void cdev_init(struct cdev *cdev, struct file_operations *fops);
    
  • After allocating and initializaing, set struct cdev owner field to THIS_MODULE.
  • The final step is to tell the kernel about it.
      int cdev_add(struct cdev *dev, dev_t num, unsigned int count);
    
    • struct cdev *dev
      • The cdev structure.
    • dev_t num
      • The first device number.
    • unsigned int count
      • the number of device numbers. (often 1)
    • cdev_add can fail.
    • Device is “live” and its operations can be called by the kernel after cdev_add successed. So you should not call cdev_add until your driver is completely ready to handle operations on the device.
  • To remove a char device from the system:
      void cdev_del(struct cdev *dev);
    
    • Do not access the cdev structure after passing it to cdev_del.

Device Registration in scull

  • The structure of scull.
      struct scull_dev {
          struct scull_qset *data;    /* Pointer to first quantum set */
          int quantum;                /* the current quantum size */
          int qset;                   /* the current array size */
          unsigned long size;         /* amount of data stored here */
          struct semaphore sem;       /* mutual exclusion semaphore */
          struct cdev cdev;           /* Char device structure */
      };
    
  • Initialize scull and added to the system.
      static void scull_setup_cdev(struct scull_dev *dev, int index) {
          int err, devno = MKDEV(scull_major, scull_minor + index);
    
          cdev_init(&dev->cdev, &scull_fops);
          dev->cdev.owner = THIS_MODULE;
          err = cdev_add(&dev->cdev, devno, 1);
          /* Fail gracefully if need be */
          if (err)
              printk(KERN_NOTICE "Error %d add scull%d", err, index);
      }
    

open and release

The open Method

  • open should perform the following tasks:
    • Check for device-specific errors (such as device-not-ready or similar hardware problems)
    • Initialize the device if it is being opened for the first time
    • Update the f_op pointer, if neccessary.
    • Allocate and fill any data structure to be put in filp->private_data.
  • Prototype of open
      int (*open)(struct inode *inode, struct file *filp);
    
  • Get scull_structure from inode
      #include <linux/kernel.h>
        
      struct scull_dev *dev; /* device information */
    
      dev = container_of(inode->i_cdev, struct scull_dev, cdev);
      filp->private_data = dev; /* for other methods in the future */
    
    • container_of(pointer, conainer_type, container_field);
      • This macro takes a pointer to a field of type container_field, within a structure of type container_type, and returns a pointer to the containing structure.
  • The scull_open function:
      int scull_open(struct inode *inode, struct file *filp) {
          struct scull_dev *dev; /* device information */
    
          dev = container_of(inode->i_cdev, struct scull_dev, cdev);
          filp->private_data = dev; /* for other methods in the future */
    
          /* now trim to 0 the length of the device if open was write-only */
          if ( (filp->f_flags & O_ACCMODE) == O_WRONLY) {
              if (mutex_lock_interruptible(&dev->lock))
                  return -ERESTARTSYS;
              scull_trim(dev); /* ignore errors */
              mutex_unlock(&dev->lock);
          }
          return 0; /* success */
      })
    

The release Method

  • The reverse of open.
  • Sometimes it is called close.
  • release should perform the following tasks:
    • Deallocate anything that open allocated in filp->private_data.
    • Shut down the device on last close.
  • The close system call executes the release method only when the counter for the file structure drops to 0.

scull’s Memory Usage

#include <linux/slab.h>

void *kmalloc(size_t size, int flags);
void kfree(void *ptr);
  • A call to kmalloc attempts to allocate size bytes of memory; the return value is a pointer to that memory or NULL if the allocation fails. The flags argument is used to describe how the memory should be allocated.
  • kmalloc is not the most efficient way to allocate large areas of memory.
  • The layout of a scull device

      struct scull_qset {
          void **data;
          struct scull_qset *next;
      }
    
  • scull_trim function:
      int scull_trim(struct scull_dev *dev) {
          struct scull_qset *next, *dptr;
          int qset = dev->qset; /* "dev" is not-null */
          int i;
    
          for (dptr=dev->data; dptr; dptr=next) { /* all the list items */
              if (dptr->data) {
                  for (i=0; i<qset; i++)
                      kfree(dptr->data[i]);
                  kfree(dptr->data);
                  dptr->data = NULL;
              }
              next = dptr->next;
              kfree(dptr);
          }
          dev->size = 0;
          dev->quantum = scull_quantum;
          dev->qset = scull_qset;
          dev->data = NULL;
          return 0;
      }
    

read and write

  • The read and write prototyes:
      ssize_t read(struct file *filp, char __user *buff, size_t count, loff_t *offp);
      ssize_t write(struct file *filp, const char __user *buff, size_t count, loff_t *offp);
    
    • struct file *filp
      • The file pointer.
    • size_t count
      • the size of requested data transfer.
    • char __user *buff
      • The pointer points to the user buffer.
      • (write) It holds the data to be written.
      • (read) The empty buffer where the newly read data should be placed.
    • loff_t *offp
      • A pointer points to a “long offset type” object that indicates the file position the user is accessing.
  • To ensure the data transfers between kernel and user space happen in a safe and correct way, use the following methods:
      #include <asm/uaccess.h>
    
      unsigned long copy_to_user(void __user *to, const void *from, unsigned long count);
      unsigned long copy_from_user(void *to, const void __user *from, unsigned long count);
    
  • The task of the read method is to copy data from the device to user space.
  • The write method must copy data from user space to the device.
  • Whatever the amount of data the methods transfer, they should generally update the file opsition at *offp.

The read Method

  • The return value (retval) for read:
    • retval == count
      • The requested number of bytes has been transferred. This is optimal case.
    • 0 < retval < count
      • Only part of the data has been transferred.
      • Most often, the application program retries the read.
    • 0 == retval
      • End-of-file was reached.
    • retval < 0
      • There was an error.
      • Check <linux/errno.h>
      • Typical values
        • -EINTR (interrupted system call)
        • -EFAULT (bad address)
  • The read function
      ssize_t scull_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos) {
          struct scull_dev *dev = filp->private_data;
          struct scull_qset *dptr; /* the first listitem */
          int quantum = dev->quantum, qset = dev->qset;
          int itemsize = quantum * qset; /* how many bytes in the listitem */
          int item, rest, s_pos, q_pos;
          ssize_t retval = 0;
    
          if (mutex_lock_interruptible(&dev->lock))
              return -ERESTARTSYS;
          if (dev->size <= *f_pos)
              goto out;
          if (dev->size < *f_pos + count)
              count = dev->size - *f_pos;
    
          /* find listitem, qset index, and offset in the quantum */
          item = (long)*f_pos / itemsize;
          rest = (long)*f_pos % itemsize;
          s_pos = rest / quantum; q_pos = rest % quantum;
    
          /* follow the list up to the right position (defined elsewhere) */
          dptr = scull_follow(dev, item);
    
          if (dptr == NULL || !dptr->data || !dptr->data[s_pos])
              goto out; /* don't fill holes */
            
          /* read only up to the end of this quantum */
          if (quantum - q_pos < count)
              count = quantum - q_pos;
    
          if (copy_to_user(buf, dptr->data[s_pos] + q_pos, count)) {
              retval = -EFAULT;
              goto out;
          }
          *f_pos += count;
          retval = count;
    
      out:
          mutex_unlock(&dev->lock);
          return retval;
      }
    

The write Method

  • The return value (retval) for write:
    • retval == count
      • The requested number of bytes has been transferred.
    • 0 < retval < count
      • Only part of the data has been transferred.
      • The program will most likely retry writing the rest of the data.
    • 0 == retval
      • Nothing was written.
      • Examine the exact meaning of this case in CH6.
    • retval < 0
      • An error occurred; as for read.
  • The write function
      ssize_t scull_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos) {
          struct scull_dev *dev = filp->private_data;
          struct scull_qset *dptr;
          int quantum = dev->quantum, qset = dev->qset;
          int itemsize = quantum * qset;
          int item, rest, s_pos, q_pos;
          ssize_t retval = -ENOMEM; /* value used in "goto out" statements */
        
          if (mutex_lock_interruptible(&dev->lock))
              return -ERESTARTSYS;
    
          /* find listitem, qset index and offset in the quantum */
          item = (long)*f_pos / itemsize;
          rest = (long)*f_pos % itemsize;
          s_pos = rest / quantum; q_pos = rest % quantum;
    
          /* follow the list up to the right position */
          dptr = scull_follow(dev, item);
          if (dptr == NULL)
              goto out;
          if (!dptr->data) {
              dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
              if (!dptr->data)
                  goto out;
              memset(dptr->data, 0, qset * sizeof(char *));
          }
          if (!dptr->data[s_pos]) {
              dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
              if (!dptr->data[s_pos])
                  goto out;
          }
          /* write only up to the end of this quantum */
          if (quantum - q_pos < count)
              count = quantum - q_pos;
    
          if (copy_from_user(dptr->data[s_pos] + q_pos, buf, count)) {
              retval = -EFAULT;
              goto out;
          }
          *f_pos += count;
          retval = count;
    
          /* update the size */
          if (dev->size < *f_pos)
              dev->size = *f_pos;
        
      out:
          mutex_unlock(&dev->lock);
          return retval;
      }
    

readv and writev

  • These “vector” versions of read and write take an array of structures, each of which contains a pointer to a buffer and a length value.
  • If your driver does not supply methods to handle the vector operations, readv and writev are implemented with multiple calls to your read and write methods.
  • The prototypes for the vector operations are:
      ssize_t (*readv) (struct file *filp, const struct iovec *iov, unsigned long count, loff_t *ppos);
      ssize_t (*writev) (struct file *filp, const struct iovec *iov, unsigned long count, loff_t *ppos);
    
  • The iovec struct:
      #include <linux/uio.h>
    
      struct iovec {
          void __user *iov_base;
          __kernel_size_t iov_len;
      };
    

Playing with the New Devices

  • Once you are equipped with the four methods (open, release, read, write) just described, the driver can be compiled and tested.
  • You can try using cp, dd, and input/output redirection to test out the driver.