rootfoo.org

Introduction to Blackmamba

Blackmamba is a new concurrent networking library for Python. Blackmamba was built from the ground up leveraging the power of epoll and coroutines. Asynchronous programming is not new, however, the existing libraries have a very high learning curve. The purpose of Blackmamba is to provide a library for client side applications that is strait-forward and easy to read and write. Hence Blackmamba's speed includes development time.

Blackmamba was designed with the following goals:

Getting Started

Although developers must build coroutines to use the Blackmamba library, it is not necessary to actually understand their intricacies. The coroutine requirement actually simplifies program flow, readability, and ease of development. This tutorial should provide sufficient instruction to get started with the Blackmamba library; no prior knowledge of coroutines is assumed or required. However, for those interested how the library works behind the scenes, a primer on Python coroutines can be found [here].

To connect to a host using the Python socket library:

import socket
sock = socket.socket()
sock.connect((host,port))

To do the same thing with Blackmamba:

from blackmamba import *
yield connect(host,port)

A few things are different. First, the Blackmamba API is exposed as coroutines (a concept very similar to system calls). To connect to a host one simply calls connect(). There is no need to create or manage sockets objects or connection state; the library does that for you. Second, all Blackmamaba system calls have to be prefixed with "yield". The "yield" expression is Python's way of building Generators and Coroutines. That's it! No need for threads, pools, queues, semaphores, callback chains, deferreds, or complicated interfaces.

Example

The following is an example of downloading a single web page 100 times. For comparison this is first done using the standard Python socket library then with Blackmamba. Following the code is a line-by-line explanation.

Downloading a webpage 100 times with the Python socket library:

import socket

def get(host, port=80):
    msg = "GET / HTTP/1.1\r\nHost: %s\r\n\r\n" % host
    sock = socket.socket()
    sock.connect((host, port))
    sock.write(msg)
    response = sock.read(4096)
    sock.close()
    print response

for i in range(100):
    get('example.com')

Downloading a webpage 100 times concurrently with Blackmamba:

from blackmamba import *

def get(host, port=80):
    msg = "GET / HTTP/1.1\r\nHost: %s\r\n\r\n" % host
    yield connect(host, port)
    yield write(msg)
    response = yield read()
    yield close()
    print response

def generate(host, count=100):
    for i in range(count):
        yield get('example.com')

run(generate('example.com'))

Explanation

  1. The get() function is the implementation of a protocol; it defines which network operations to perform and in what order. A Blackmamba protocol can be any coroutine that uses the blackmamba system calls.
  2. Blackmamba's connect system call establishes a connection to the host. The Python "yield" expression suspends execution until the connection is established and then resumes automatically; it is the Python magic that makes code non-blocking. Anytime yield is used in this manner the function is technically a coroutine. However, so long as each Blackmamba system call is preceded by the yield keyword, that detail can be overlooked.
  3. The write system call is equivalent to socket.write or socket.send. Execution is suspended by yield until the write has completed. Then execution continues with the next line. While execution is suspended other functions/protocols may be resumed if their pending network operations have completed.
  4. Read data from the connection and store it in the response variable. Again, yield suspends execution until the response has been fully read. At the moment read() returns all data available and the amount to read cannot be specified. This behavior may change in later versions.
  5. Close the connection and resume upon completion.
  6. There is no gain in using a concurrent networking library if there is only one network operation to perform; typical usage will involve multiple simultaneous network operation. To accomplish this, Blackmamba uses Python Generators. The generate() function in this example is a Generator that yields protocols. Remember, for the sake of Blackmamba programming, a protocol is any function that uses the "yield syscall()" syntax.
  7. The run() method is what performs all the work. It expects a generator as the first argument and pulls protocols from it and executes them thousands at a time. Because the run() method blocks until there are no more items, it should be the last thing called by the application.

For comparison, using Blackmamba instead of sockets required a total increase of one line of code and significantly faster.

Project Status

Blackmamba originated as the byproduct of my personal quest to understand concurrent and parallel programming techniques. My motivation began with the need to rapidly prototype fast penetration testing tools for brute force and discovery. Tired of the pains of resource sharing in multi-threaded applications, I looked to non-blocking approaches. After getting lost in the world of Twisted deferreds, callback chains, and other confusing and interface heavy approaches, I discovered the simplicity and beauty of coroutines.

Known Issues

Requirements

At the moment Blackmamba is available for Python 2.6 on Linux 2.5.66 or later. Coroutine support was added to Python in 2.5 and the epoll module was added in 2.6. It is completely possible port this code to another operating system or language which is an exercise left to the reader. ;)

Download