Saturday, May 16, 2009

A new method to distinguish between ring buffer full and empty

One design choice for ring buffer implementation is to distinguish
between full buffer and empty buffer. Traditional methods including
waste one entry for full (head + 1 == tail), or use
another flag for full. A new method is implemented as follow:

#define RING_LEN   16
#define RING_LIMIT (2 * RING_LEN - 1)

struct ring_entry
{
...
};

struct ring
{
int head;
int tail;
struct ring_entry entries[RING_LEN];
};

int ring_index(int n)
{
if (n > RING_LEN)
return n - RING_LEN;
else
return n;
};

int ring_get(struct ring *ring, struct ring_entry *entry)
{
/* empty */
if (ring->head == ring->tail)
return 0;

*entry = ring->entries[ring_index(ring->tail)];

ring->tail++;
if (ring->tail > RING_LIMIT)
ring->tail = 0;
return 1;
}

int ring_put(struct ring *ring, struct ring_entry *entry)
{
int ihead, itail;

ihead = ring_index(ring->head);
itail = ring_index(ring->tail);

/* full */
if (ihead == itail && ring->head != ring->tail)
return 0;

ring->entries[ihead] = *entry;

ring->head++;
if (ring->head > RING_LIMIT)
ring->head = 0;
return 1;
}

Buffers in Linux kernel

For 2.6.30

Oprofile Buffer

  • One cpu_buffer for each CPU
    • NMI or IRQ handler add record to cpu_buffer
    • Timer based per-cpu work queue remove record from cpu_buffer
    • One reader and one writer, so it is easy for mutual exclusion and synchronization
    • fix size, may overflow if write too much or not removed quickly enough
  • One event_buffer
    • Timer based per-cpu work queue add record to event_buffer from that of cpu_buffer
    • User space program remove record from event_buffer. User space program is waken up based on threshold.
    • One reader and multiple writer, mutex is used on write side, because writers are in process context (work queue).
    • fix size, may overflow if write too much or not read out quickly enough
  • Synchronization between cpu_buffer, event_buffer and use space program
    • Used for profile, so record flow bandwidth is predictable
      • Maximum bandwidth for cpu_buffer: CPU number * (cpu_buffer size / timer interval)

Unified trace ring buffer

  • One ring_buffer_per_cpu for each CPU
    • In fact two levels of records, the first level is struct buffer_page, the second level is struct ring_buffer_event.
    • Preempt disabling is used to provide mutual exclusion between process contexts.
    • Inside one struct buffer_page, atomic operation (local_add_return) is used to provide mutual exclusion for struct ring_buffer_event. Because process context can only be preempted by IRQ and NMI, and process context can not continue until IRQ and NMI finishes. The length of struct buffer_page is known (local_add_return may exceed buffer_page length).
    • In writer side, "write" indicates allocated records, while "commit" indicates completed records.
    • spinlock (ring_buffer_per_cpu->lock) and IRQ disable is used to provide mutual exclusion for struct buffer_page (reader and writer). In NMI environment, if try_spin_lock failed, the record is discarded, this acceptable for tracing.
    • At least 3 pages are needed for each CPU, 2 buffer_page, 1 reader page.
    • ring_buffer_per_cpu->reader_lock is used to provide mutual exclusion between multiple readers, while ring_buffer_per_cpu->lock is used to provide mutual exclusion between reader and writer.
    • fix size, may overflow if write too much or not consumed quickly enough
    • Iterator (iter) can be used on reader side

Mcelog buffer

  • One global buffer with fixed record length, fixed size. May overflow.
  • Writer side lock-less, implemented using cmpxchg() + finished flag. Records are added in NMI/timer context.
  • Memory order should be considered explicitly because of lock-less
  • Reader side is protected by mutex, because normally there is only one reader. Records are removed in process context.
  • Multiple writers, one reader, throughput bottleneck lies in reader side.

SMP preemption -> UP preemption

Sometimes, a SMP environment can turn into a UP environment via per-CPU data structure. Such as unified trace ring buffer is a per-CPU data structure, so the mutual exclusion can be considered in UP model for each per-CPU buffer.

Preemption in UP

There are 3 kinds of contexts in UP environment.

  1. Process context
  2. IRQ context
  3. NMI context

Where 1 can be preempted by 1, 2 or 3; 2 can be preempted by 2 or 3; 3 can not be preempted.

If preempt is off in 1, 1 can only be preempted by 2 or 3. And there is no interleave between contexts, that is, if 1 is preempted by 2, 1 will not be resumed until 2 is finished, so do 2 and 3. This can be used to develop more efficient synchronization algorithm.

If IRQ is off in 1 and 2, 1 and 2 can only be preempted by 3, without interleave.

Saturday, May 9, 2009

Reading notes of on submit kennel patches (OLS)

!!! Important kernel design point !!!

Very generic notifiers and hooks tend to have a large maintenance impact because they have the potential to alter the code flow in unexpected ways, lead to lock order problems, bugs, and unexpected behavior, and generally making code harder to maintain and debug later.

Linux kernel is a lock-based complex parallel system, the main method to prevent ABBA style dead-lock is enforcing lock order. But notifiers or hooks makes it hard to audit the order of locks.

Notifiers or hooks in lower layer to call functions on higher layer should be avoided. One method to call higher layer function in lower layer is via work queue like asynchronous mechanism.

Notifiers or hooks in higher layer to call functions on lower layer is OK. Such as struct file_operations is collections of function pointers (a kind of hooks?), it is only used as some kind of abstraction for lower layer, make adding new lower layer component more convenient.

Maintenance

For a project which is very complex like Linux kernel, (seems that most projects become more and more complex as time goes by), maintenance is something in the core of Linux kernel development.

To make code maintainable, firstly, code should be made readable. The code is clean-upped and redesigned again and again to make it cleaner, thus more readable.

It is much easier to add a component under the current design. Such as add a new device driver. If you need something to be changed in subsystem core, you should consider the design, maybe you need clean-up or redesign some part of the subsystem core to make code maintainable.

A patch submitter should maintain his code at least for a short while after merging.

Interface designs for user space programs is very important too. It should be stable for imaginable future.

Changing a published interface later is much harder and often special backwards compatiblility code is required.

Friday, May 8, 2009

Why MID?

What I need for a Mobile device

  • Fast from picking up to using (Fast boot)
  • light, easy for taking and hand holding
  • Applications
    • Web browsering
    • Electric book
    • Online video and offline video
    • PIM
    • Game
    • GPS?

MID vs Notebook

pros:
  • Fast from picking up to using: fast boot time or always suspend to RAM instead of power off.
  • lighter, easy for taking and hand holding
cons:
  • Screen is too small for some applications, especially most web page, PDF, DJVU ebooks
  • Notebook has better Linux support, some application has best usability on Notebook.

MID vs Smartphone

pros:
  • Screen is bigger, although smaller than Notebook, much better than smartphone's.
  • OS is compatible with Desktop ones (Linux or Windows), most desktop applications can run on MID. Although Linux based smartphone can run most desktop applications too, the resource of most smartphone is not enough for desktop application.
  • Computing resource of MID (CPU, RAM, storage) is much better than smartphone.
cons:
  • Smartphone is all-in-one solution, that is, you can bring just smartphone or smartphone AND MID.
  • Smartphone has GPRS support in China, while most MID has not.
  • Some smartphone application is good for small screen, such as ucweb for web browsing. Some desktop application is not good for small screen, even bigger screen of MID.

Comparison

Notebook(X61T) MID (no yet) Smartphone(N78) PDA (Zaurus)
Fast usage 1 4.5 5 4.5
Weight 1 4.5 5 5
Browsing 5 4 ~ 4.5 3 (ucweb) 2
EBook 5 3 ~ 3.5 2.5 (screen) 3
Online Video 5 4.5 0 (no WIFI) 0 (too slow)
Offline Video 5 4.5 4 3.5
PIM 4 4 4 3
Game 5 5 2 (keypad) 4
GPS 0 5 (maybe) 5 0

Conclusion

  • Although MID should be better than smartphone, there is no good choice until now. Aigo or Vilivs is too expensive. Smart Q5 is not better than Zaurus (No stable software too). Most functions of MID can be accomplished with smartphone.
  • Notebook + smartphone is good for now, waiting for better MID.