News

Useful command-line tools

To start off a brand new year, I wanted to share a few productivity tools that I started using lately: tmux, ack, htop. Hopefully this makes this year faster and better than 2012!

  • tmux is a drop-in replacement for GNU screen, that most-importantly allows vertical splitting — good for large monitors. Additionally, the documentation is easier to find, it is easy to configure and is actively developed. Sadly its not pre-installed on the many machines, but I can help you save time with configuring it. tmux does much more but I haven’t had the time to explore.

  • ack is an intelligent grep for source code searches — highlights the search term, groups occurrences within a file, ignores .svn/.hg junk, etc. (Thanks to Alex Rasmussen for suggesting this)

  • htop is a more colourful and informative top. For starters, it displays CPU and memory usages as bars making it much easier to interpret (I spend lots of time interpreting system memory usage, cache, etc.). It also displays full process names, and has an easy-to-configure interface.

  • synergy is a tool to share the mouse and keyboard across computers (not really a console tool). Use this to share these devices across Linux, MacOSX and Windows, without having to move your hands across from desktops to laptops and back. Please refer to my G+ post for details on the ways I use it.

Posted in Productivity | Tagged , | Leave a comment

Ruby Interoperability

Inspired by hearing so much about Ruby lately, I took the time to learn a bit of Ruby. Of course, as a core Mace developer, my first thought of what I could do with Ruby was to write an application that talked to a Mace application! I’ve written up an example which is now available in the Mace documentation (http://www.macesystems.org/wiki/documentation:rubypingpong) that illustrates using Ruby to talk to a simple Mace application, and through it illustrates a bit of the Mace networking and wire protocol.

Is there interest in greater support or documentation for Ruby (or other language) in a Mace environment?

Posted in Uncategorized | 2 Comments

Composable Reliability for Asynchronous Systems: Treating Failures as Slow Processes (USENIX ATC 2012)

Conventional distributed systems wisdom is to treat slow nodes the same as failed nodes, through the use of leases and timeouts, handling merely slow nodes by effectively rebooting or restarting them.

While this wisdom has led to simpler systems that can effectively handle high churn while also continuing to function in the face of high network delays, it is ill suited to more modern managed environments, where many processes are co-located in a small geographic region such as a campus or data center. In these environments, delays are commonly just that—delays. Moreover, a failed process is likely to be restarted quite quickly, so that if state is effectively maintained, it need not be erased as it would by conventional wisdom.

In our paper, we propose that conventional wisdom be rebooted for managed distributed systems: that we should instead treat failed nodes as slow nodes, allowing the system to more gracefully handle common scenarios such as power failures.

We present Ken protocol, a lightweight distributed reliability model that is well suited to developing highly-survivable applications for these environments and allows the developer to focus on the crash-free behavior of applications.

We also demonstrate how this model can be effortlessly integrated with Mace toolkits for building large-scale distributed systems, yielding MaceKen, to support survivable application development. Moreover, this model allows multiple, independently developed application components to be seamlessly composed, preserving the reliability benefits across systems.

You can download MaceKen in here.

Sunghwan Yoo, Charles Killian, Terence Kelly, Hyoun Kyu Cho, and Steven Plite.  Composable Reliability for Asynchronous Systems: Treating Failures as Slow Processes, In proceedings of 2012 USENIX Annual Technical Conference (USENIX ATC ’12). Boston, MA. (to appear) 13-15 June, 2012.

Posted in Papers | Leave a comment

Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems (NSDI 2012)

This paper describes our work on Distalyzer: a tool for automatically diagnosing performance problems in distributed systems. It was accepted for publication at NSDI 2012, and is work done by Karthik Nagaraj, Charles Killian and Jennifer Neville.

Diagnosis and correction of performance issues in modern, large-scale distributed systems can be a daunting task, since a single developer is unlikely to be familiar with the entire system and it is hard to characterize the behavior of a software system without completely understanding its internal components. Moreover, distributed systems are extremely complex because of the innate complexity of their code, combined with the network that can cause unpredictable delays and orderings.

This paper describes Distalyzer, an automated tool to support developer investigation of performance issues in distributed systems. We aim to leverage the vast log data available from large scale systems, while reducing the level of knowledge required for a developer to use our tool. Specifically, given two sets of logs, one with good and one with bad performance, Distalyzer uses machine learning techniques to compare system behaviors extracted from the logs and automatically infer the strongest associations between system components and performance.

We’ve released the source code and logs used in the paper, and put up the final version at Distalyzer’s webpage.

Posted in Papers | Tagged , , , , | Leave a comment

Gatling: Automatic Attack Discovery in Large-Scale Distributed Systems (NDSS 2012)

Most distributed systems are designed to meet application-prescribed metrics that ensure availability and high-performance for practical usage. However compromised participants can manipulate protocol semantics through attacks that target the messages exchanged with honest nodes and degrade performance significantly. To date, finding attacks against performance has been primarily a manual task due to both the difficulty of expressing performance as an invariant in the system and the state-space explosion that occurs as attackers are more realistically modeled. Our design goal is to find performance attacks automatically requiring the least amount of user effort.

In our paper, we propose Gatling, a framework that automatically finds performance attacks caused by insider attackers in large-scale message-passing distributed systems. We define performance attacks as following; malicious nodes deviate from the protocol when sending or creating messages, with the goal of degrading system performance. Gatling is a framework that combines a model checker and simulator environment with a fault injector to find performance attacks in event-based message passing distributed systems. Continue reading

Posted in Papers | Leave a comment

Hierarchy-Aware Distributed Overlays in Data Centers using DC2 (COMSNETS 2012)

Today’s data center architectures are often built in the form of multi-rooted tree topologies with with less overall bandwidth at higher levels of the tree than at the bottom. In datacenter parlance, this is referred to as the over-subscription factor, and according to literature, many existing datacenters are heavily oversubscribed with ratios of about 240:1 or 80:1. This clearly means that the links at the top of the tree are more scarce resources than at the bottom of the tree. Specifically, the bandwidth that is typically available within a rack (through Top-Of-Rack switches) is significantly higher than the bandwidth available at the core routers.

The current trend of Internet services is to run on distributed nodes in gigantic data centers comprising 100s of thousands of machines spanning multiple continents. Scalable multicast trees and distributed key-value stores form the key building blocks for these applications, relying on a self-organizing overlay among the nodes (Chord, Pastry, Bamboo). However, a straightforward implementation of many existing distributed overlays would pay no heed to this information, and would lead to sub-optimal routes that cross the top of the tree—often many times over.

This paper presents a framework called DC2 (short for data center aware distributed communication) that incorporates the hierarchical information specific to data center environments in its overlay routing. This helps applications to limit communication locally within clusters (e.g. racks) and cross clusters only when necessary. To show the generality of DC2, we build two systems on top of DC2— an overlay multicast system called DC2-Multicast and a key- value object store called DC2-Store, that both minimize cost and increase scalability.

Karthik Nagaraj, Hitesh Khandelwal, Charles Killian, Ramana Rao Kompella. Hierarchy-Aware Distributed Overlays in Data Centers using DC2. In Proceedings of the 4th International Conference on Communication Systems and Networks (COMSNETS 2012). Jan 3-7, Bangalore, India. (to appear)

Posted in Papers | Tagged , , , | Leave a comment

MacePC publicly available

We’d  like to announce that the source code for the MacePC paper published at FSE last year is now publicly available through the Mace SVN repository.

Instructions on downloading Mace and MacePC usage are available here.

Posted in Uncategorized | Leave a comment

InContext: Simple Parallelism for Distributed Applications (HPDC 2011)

The event-driven model is a popular framework for those who want to build a complex distributed application while keeping the simple event-driven system features where the lower details can be hidden.

However, introducing parallelism in the event-driven system has not been an easy task since it has to solve the concurrency problem between various events sharing state variables.

Assume that we have two events, event_a() and event_b() as follows.

state_variable v; event_a() { read(v); write(v, 1); } event_b() { read(v); // do something read(v); }

What happens when we run those two events in parallel? write(v,1) statement in event_a() can occur between the two read(v) statements in event_b(). This concurrent access will violate programmer expectations.

The simplest way to overcome this issue is not allowing parallelism at all. This is called the atomic event model, which Mace has used since its creation due to its simplicity.

However, what if we have a service that has a recurring event to be handled in timely manner? One good example is a heartbeat(ping) message to check the liveness of each node. If we have a event that takes a long time to complete, the heartbeat message may not be processed in time while the long-running event is handled.

The ideal solution (as suggested in InContext model) is allowing parallelism in a service so that each event can be handled separately in different threads. This will increase the responsiveness of the system that has a background service like ping messages to check liveness.

The problem is to how to deal with concurrent access to shared variables. How can we safely run the multiple events in parallel without breaking concurrency? Of course we can take another approach like SEDA by calling other methods by queuing events to be processed. However, one cannot simply parallelize their events with SEDA and it may require a lot of restructuring work on existing system. Is there a simpler algorithm?

InContext provides a solution for those who seek parallelism in event driven system while keeping simplicity. Continue reading

Posted in Papers | Leave a comment

Finding Latent Performance Bugs in Systems Implementations (FSE 2010)

Performance is one of the later but important component in building efficient Distributed systems. Precise performance guarantees are essential for high throughput and time sensitive distributed applications, and hence developers have to chase performance. Most of the time, performance problems are uncovered in non-ideal conditions (such as delayed or dropped packets, node failures, network partitions, etc.) making it hard to identify problems especially in complex interleaved distributed executions. Moreover, common development approaches adopt periodic recovery mechanisms that eventually bring protocol correctness and node consistency. But this strategy is detrimental to performance debugging as it conceals the inability (of the protocol) to attain consistency in their absence. We refer to such problems as latent performance anomalies, which have serious design and implementation problems with big implications to responsiveness, availability and end-to-end behavior of the system.

In our FSE paper, we present MacePC – that automates the discovery of latent performance problems in distributed systems through a systematic analysis of the system. Broadly, MacePC first identifies representative execution performance of the distributed system, following which it sets out to explore the state space to identify executions that lead to anomalous run times. Any sequence of events that leads to a very large (or small) run time is flagged for developer inspection, along with a point in the execution that led to the bad performance. We implemented MacePC over Mace and evaluated it across 5 mature implementations of popular distributed protocols. In this exercise, we identified (previously unknown) bugs in the protocol/implementation that cause poor performance. Continue reading

Posted in Papers | 2 Comments

What is this space for?

This space, the homepage for MaceSystems, is a blog space, for posts related to whats going on in our research group. It’s a place where we may post descriptions of published work or concrete progress of our ongoing work, so you can see what’s going on in our world. For example, soon there will be posts about our recent papers at FSE 2010 and HPDC 2011. Much of the text here will be written by students. We hope this will evolve into a good way for us to keep the community informed about our work and direction, and for potential discussion of the work.

The MaceSystems site still contains links for the original MACEDON code, as well as a link to the webpage describing Mace. Hopefully, as new related projects and tools mature, they too will have their own pages. Meanwhile, there is a Wiki where we can post more fluid and changing content, and in particular on the Wiki you can find a web version (and somewhat updated version) of the PDF documentation available as part of the Mace repository.

Posted in Uncategorized | Leave a comment