How to write an efficient discrete event simulator?

I wrote a discrete event simulator for my research on P2P live streaming applications. I wrote my own simulator rather than using existing ones for two reasons: I need high efficiency to simulate a large P2P network, and I need detailed output, including the internal states of each peer, for analysis. The simulator is written in C++. The simulator has about 5000 lines of algorithm-laden code to simulate P2P streaming and 5000 lines to generate topology and analyze results. Most of the code are specific to the P2P live streaming schemes, but the schduler, which determines the efficiency, is general enough to be used to simulate other types of networks.

Introduction to discrete event simulator

Please refer to the following for the basic concepts and terminology of simulation.
  • Banks et al., Discrete Event System Simulation, Prentice Hall, 2001
  • White and Ingalls, Introduction to Simulation, Proc. Winter Simulation Conf., 2009
I often find it hard to understand concepts without examples. Here is a talk I gave in one of our lab's group meetings. It is based on the above two documents and provides an example to explain the concepts. The example also includes pseudo-codes for the implementation. Download the slides for the talk and the example problem used in this talk.

How to write a simulator for P2P video live streaming applications?

First, decide the level of details you want to simulate. A P2P network simulator can simulator at three levels of details. The first level is to simulate at the IP layer, such as NS2. Each host, router, and Internet link between two routers/hosts is a static entity. The enque and deque and a packet at each router and host is an event. The second level is to simulate at the application layer. Each host and Internet path between two hosts is a static entity. The enque and deque of a packet at each host is an event. The third level is to simulation the status of peers. A peer can send and receive packets if it is in the good state. The enque and deque of a packet is not an event. For P2P live streaming applications, the second level should be used.

Second, decide the programming language. Most efficient software is written in C or C++. Fortunately, this is the language I am most familiar with.

More on this topic later. I need some time to clean my code and complete documentation. Probably after finishing my thesis.

Bottlenecks of a Simulator

A simulator is a CPU-intensive program. The key to an efficient simulator is an efficient scheduler. The scheduler provides several methods to manipulate the event queue. The event queue is a queue of events sorted by the time of events. The scheduler repeatedly gets the head event of the event queue and dispatches the event (i.e., invokes the event's handler and passes the event to its handler).
  • insert: insert an event into the event queue.
  • cancel: delete an event in the event queue.
  • run: dispatch events in the event queue.
An event queue may contain hundreds of thousands of events. Most of the CPU cycles of a simulator are spent on searching the position in the event queue to insert or delete an event. In my simulator, the queue typically contains 10K-30K events and more than 80% of CPU times are spent on searching.

Tips to write an efficient scheduler

Both the list and vector can be used for the event queue. After a few experiments, I find that although vector allows binary search, the insertion and deletion take lots of time. NS2 also use linked list for event queue. I find the STL of C++ very efficient. Actually it is more efficient than the vector I wrote. So I use STL.

With list, you can only search by iteration, and hence the efficiency is determined on how quick you can find an event given its ID. NS2 searches the event queue from the beginning one-by-one, which takes lots of time when the event queue is long. In my simulation, the event queue has tens of thousands of events. With this method, to simulate 3000 peers for 10 minutes, my simulator needs to run for two days. I used two tricks to cut the running time to about three hours.

The first trick is to decide whether to start from the head or tail of the event queue: search forward if the event's time is close to the current time, and search backward if the event's time is far away from the current time. This method may improve the performance by about 30%, and it is feasible because I know the distribution of triggering times of events in the queue.

The second trick is to use multiple event queues. Because the search time is proportional to the queue size, break the event queue into N queues can improve the performance several times. This method applies to situations where the distribution of triggering times of events are unknown and can improve the performance by several folds.

Each event is hashed to a queue according to its eventID and is inserted to the corresponding event. When the schduler dispatches events, it compares the trigger times of the head event in all the queues and select the event with the earliest time. It is possible that two events have the same triggering times but have to be scheduled in order. To preserve the order, each event has an attribute called insertSn. If two head events have the same triggering time, the schduler dispatches the event with a smaller insertSn.

Samle Codes

The following is the skeleton codes of the scheduler.
class Scheduler {
public:
    Scheduler(): running(false), clock(0) { }
    virtual ~Scheduler() { };
    void run();
    void stop();
    unsigned long int insert(Event *event);
    void cancel(unsigned long int eid);
    Time getTime() const { return clock; }
private:
    bool running;               // whether the scheduler is running
    Time clock;                 // the current simulation time
    list events[SCHEDULER_N];
    unsigned int getQueue(unsigned long int eid) const { return (eid % SCHEDULER_N); }
};
unsigned long int Scheduler::insert(Event* event) {
    unsigned int n = getQueue(event->getEid());
    IsAfter isAfter(event);
    list::iterator it = std::find_if(events[n].begin(), events[n].end(), isAfter);
    event->setInsertSn(g_insertSn++);
    events[n].insert(it, event);
    return event->getEid();
}
void Scheduler::cancel(unsigned long int eid) {
    list::reverse_iterator rit;
    unsigned int n = getQueue(eid);
    rit = std::find_if(events[n].rbegin(), events[n].rend(), HasEid(eid));
    if (rit != events[n].rend()) {
        delete (*rit);
        list::iterator it = rit.base();
        it--;
        events[n].erase(it);
    }
}
void Scheduler::run() {
    running = true;
    while (running) {
        Time t = DBL_MAX;
        unsigned int n = UINT_MAX;          // index to event queue array
        for (unsigned int i = 0; i < SCHEDULER_N; i++) {
            if (events[i].empty()) continue;
            if (events[i].front()->getTime() < t) {
                t = events[i].front()->getTime();
                n = i;
            } else if (events[i].front()->getTime() == t) {
                if (events[i].front()->getInsertSn()
                        < events[n].front()->getInsertSn()) {
                    t = events[i].front()->getTime();
                    n = i;
                }
            }
        }
        if (n != UINT_MAX) {
            Event *e = events[n].front();
            events[n].pop_front();
            clock = e->getTime();
            e->getHandler()->handle(e);
            delete e;
        }
    }
}