Every once and a while the topic of multimethods (also known as generic dispatch) comes up in the Python world (see here, and here, here too, and finally here, and probably others). For those of you who aren't familiar with the concept, the idea is that you declare a bunch of functions with the same name, but that take different arguments and the language routes your calls to that function to the correct implementation, based on what types you're calling it with. For example here's a C++ example:
#include <iostream>
void special(int k) {
std::cout << "I AM THE ALLMIGHTY INTEGER " << k << std::endl;
}
void special(std::string k) {
std::cout << "I AM THE ALLMIGHTY STRING " << k << std::endl;
}
int main() {
special(42);
special("magic");
return 0;
}
As you can probably guess this will print out:
I AM THE ALLMIGHTY INTEGER 3 I AM THE ALLMIGHTY STRING magic
You, the insightful reader, are no doubt fuming in your seats now, "Alex, you idiot, Python functions don't have type signatures, how can we route our calls based on something that does not exist!", and right you are. However, don't tell me you've never written a function that looks like:
def my_magic_function(o):
if isinstance(o, basestring):
return my_magic_function(int(o))
elif isinstance(o, (int, long)):
return cache[o]
else:
return o
Or something like that, the point is you have one function that has a couple of different behaviors based on the type of it's parameter. Perhaps it'd be nice to separate each of those behaviors into their own function (or not, I don't really care what you do).
I was saying that a bunch of people have already implemented these, why am I? Mostly for fun (that's still a valid reason, right?), but also because a bunch of the implementations make me sad. Some of them use crazy hacks (reading up through stack frames), a few of them have global registrys, and all of them rely on the name of the function to identify a single "function" to be overloaded. However, they also all have one good thing in common: decorators, yay!
My implementation is pretty simple, so I'll present it, and it's test suite without explanation:
class MultiMethod(object):
def __init__(self):
self._implementations = {}
def _get_predicate(self, o):
if isinstance(o, type):
return lambda x: isinstance(x, o)
assert callable(o)
return o
def register(self, *args, **kwargs):
def inner(f):
key = (
args,
tuple(kwargs.items()),
)
if key in self._implementations:
raise TypeError("Duplicate registration for %r" % key)
self._implementations[key] = f
return self
return inner
def __call__(self, *args, **kwargs):
for spec, func in self._implementations.iteritems():
arg_spec, kwarg_spec = spec
kwarg_spec = dict(kwarg_spec)
if len(args) != len(arg_spec) or set(kwargs) != set(kwarg_spec):
continue
if (all(self._get_predicate(spec)(arg) for spec, arg in zip(arg_spec, args)) and
all(self._get_predicate(spec)(kwargs[k]) for k, spec in kwarg_spec.iteritems())):
return func(*args, **kwargs)
raise TypeError("No implementation with a spec matching: %r, %r" % (
args, kwargs))
And the tests:
import unittest2 as unittest
from multimethod import MultiMethod
class MultiMethodTestCase(unittest.TestCase):
def test_basic(self):
items = MultiMethod()
@items.register(list)
def items(l):
return l
@items.register(dict)
def items(d):
return d.items()
self.assertEqual(items([1, 2, 3]), [1, 2, 3])
# TODO: dict ordering dependent, 1 item dict?
self.assertEqual(items({"a": 1, "b": 2}), [("a", 1), ("b", 2)])
with self.assertRaises(TypeError):
items(xrange(3))
def test_duplicate(self):
m = MultiMethod()
@m.register(list)
def m(o):
return o
with self.assertRaises(TypeError):
@m.register(list)
def m(o):
return o
if __name__ == "__main__":
unittest.main()
Bon appétit.
You can find the rest here. There are view comments.
This morning Facebook announced, as had been rumored for several weeks, a new, faster, implementation of PHP. To someone, like me, who loves dynamic languages and virtual machines this type of announcement is pretty exciting, after all, if they have some new techniques for optimizing dynamic languages they can almost certainly be ported to a language I care about. However, because of everything I've read (and learned about PHP, the language) since the announcement I'm not particularly excited about HipHop.
Firstly, there's the question of what problem HipHop solves. It aims to improve the CPU usage of PHP applications. For all practical purposes PHP exists exclusively to serve websites (yes, it can do other things, no one does them). Almost every single website on the internet is I/O bound, not CPU bound, web applications spend their time waiting on external resources (databases, memcache, other HTTP resources, etc.). So the part of me that develops websites professionally isn't super interested, Facebook is in the exceptionally rare circumstance that they've optimized their I/O to the point that optimizing CPU gives worthwhile returns. However, the part of me that spends his evenings contributing to Unladen Swallow and hanging around PyPy still thought that there might be some interesting VM technology to explore.
The next issue for consideration was the "VM" design Facebook choose. They've elected to compile PHP into C++, and then use a C++ compiler to get a binary out of it. This isn't a particularly new technique, in the Python world projects like Shedskin and Cython have exploited a similar technique to get good speed ups. However, Facebook also noted that in doing so they had dropped support for "some rarely used features — such as eval()". An important question is which features exactly, they had dropped support for. After all, the reason compiling a dynamic language to efficient machine code is difficult is because the dynamicism defeats the compiler's ability to optimize, but if you remove the dynamicism you remove the obstacles to efficient compilation. However, you're also not compiling the same language. PHP without eval(), and whatever else they've removed is quite simply a different language, for this reason I don't consider either Shedskin or Cython to be an implementation of Python, because they don't implement the entire language.
This afternoon, while I was idling in the Unladen Swallow IRC channel a discussion about HipHop came up, and I learned a few things about PHP I hadn't previous realized. The biggest of these is that a name bound to a function in PHP cannot be undefined, or redefined. If you've ever seen Collin Winter give a talk about Unladen Swallow, the canonical example of Python's dynamicism defeating a static compiler is the len() function. For lists, tuples, or dicts a call to len() should be able to be optimized to a single memory read out of a field on the object, plus a call to instantiate an integer object. However, in CPython today it's actually about 3 function calls, and 3 memory reads to get this data (plus the call to instantiate an integer object), plus the dictionary lookup in the global builtins to see what the object named len is. That's a hell of a lot more work than a single memory read (which is one instruction on an x86 CPU). The reason CPython needs to do all that work is that it a) doesn't know what the len object is, and b) when len is called it has no idea what its arguments will be.
As I've written about previously, Unladen Swallow has some creative ways to solve these problems, to avoid the dictionary lookups, and, eventually, to inline the body of the len() into the caller and optimize if for the types it's called with. However, this requires good runtime feedback, since the compiler simply cannot know statically what any of the objects will actually be at runtime. However, if len could be know to be the len() function at compile time Unladen Swallow could inline the body of the function, unconditionally, into the caller. Even with only static specialization for lists, dicts, and tuples like:
if isinstance(obj, list):
return obj.ob_size
elif isinstance(obj, tuple):
return obj.ob_size
elif isisinstance(obj, dict):
return obj.ma_fill
else:
return obj.__len__()
This would be quite a bit faster than the current amount of indirection. In PHP's case it's actually even easier, it only has one builtin array type, which acts as both a typical array as well as a hash table. Now extend this possibly optimization to not only every builtin, but every single function call. Instead of the dictionary lookups Python has to do for every global these can just become direct function calls.
Because of differences like this (and the fact that PHP only has machine sized integers, not arbitrary sized ones, and not implementing features of the language such as eval()) I believe that the work done on HipHop represents a fundamentally smaller challenge than that taken on by the teams working to improve the implementations of languages like Python, Ruby, or Javascript. At the time of the writing the HipHop source has not been released, however I am interested to see how they handle garbage collection.
You can find the rest here. There are view comments.
Recently I had a bit of an interesting problem, I needed to define a way to represent a C++ API in Python. So, I figured the best way to represent that was one class in Python for each class in C++, with a functions dictionary to track each of the methods on each class. Seems simple enough right, do something like this:
class String(object):
functions = {
"size": Function(Integer, []),
}
We've got a String class with a functions dictionary that maps method names to Function objects. The Function constructor takes a return type and a list of arguments. Unfortunately we run into a problem when we want to do something like this:
class String(object):
functions = {
"size": Function(Integer, []),
"append": Function(None, [String])
}
If we try to run this code we're going to get a NameError, String isn't defined yet. Django models have a similar issue, with recursive foreign keys. Django's solution is to use the placeholder string "self", and have a metaclass translate it into the right class. Also having a slightly more declarative API might be nice, so something like this:
class String(DeclarativeObject):
size = Function(Integer, [])
append = Function(None, ["self"])
So now that we have a nice pretty API we need our metaclass to make it happen:
RECURSIVE_TYPE_CONSTANT = "self"
class DeclarativeObjectMetaclass(type):
def __new__(cls, name, bases, attrs):
functions = dict([(n, attr) for n, attr in attrs.iteritems()
if isinstance(attr, Function)])
for attr in functions:
attrs.pop(attr)
new_cls = super(DeclarativeObjectMetaclass, cls).__new__(cls, name, bases, attrs)
new_cls.functions = {}
for name, function in functions.iteritems():
if function.return_type == RECURSIVE_TYPE_CONSTANT:
function.return_type = new_cls
for i, argument in enumerate(function.arguments):
if argument == RECURSIVE_TYPE_CONSTANT:
function.arguments[i] = new_cls
new_cls.functions[name] = function
return new_cls
class DeclarativeObject(object):
__metaclass__ = DeclarativeObjectMetaclass
And that's all their is to it. We take each of the functions on the class out of the attributes, create a normal class instance without the functions, and then we do the replacements on the function objects and stick them in a functions dictionary.
Simple patterns like this can be used to build beautiful APIs, as is seen in Django with the models and forms API.
You can find the rest here. There are view comments.
A while ago Eric Holscher blogged about things he didn't learn in college. I'm going to take a different spin on it, looking at both things that I did learn in school that I wouldn't have learned else where (henceforth defined as my job, or open source programming), as well as thinks I learned else where instead of at college.
- Big O notation, and algorithm analysis. This is the biggest one, I've had little cause to consider this in my open source or professional work, stuff is either fast or slow and that's usually enough. Learning rigorous algorithm analysis doesn't come up all the time, but every once in a while it pops up, and it's handy.
- C++. I imagine that I eventually would have learned it myself, but my impetus to learn it was that's what was used for my CS2 class, so I started learning with the class then dove in head first. Left to my own devices I may very well have stayed in Python/Javascript land.
- Finite automaton and push down automaton. I actually did lexing and parsing before I ever started looking at these in class (see my blog posts from a year ago) using PLY, however, this semester I've actually been learning about the implementation of these things (although sadly for class projects we've been using Lex/Yacc).
- Compilers. I've learned everything I know about compilers from reading my papers from my own interest and hanging around communities like Unladen Swallow and PyPy (and even contributing a little).
- Scalability. Interesting this is a concept related to algorithm analysis/big O, however this is something I've really learned from talking about this stuff with guys like Mike Malone and Joe Stump.
- APIs, Documentation. These are the core of software development (in my opinion), and I've definitely learned these skills in the open source world. You don't know what a good API or documentation is until it's been used by someone you've never met and it just works for them, and they can understand it perfectly. One of the few required, advanced courses at my school is titled, "Software Design and Documentation" and I'm deathly afraid it's going to waste my time with stuff like UML, instead of focusing on how to write APIs that people want to use and documentation that people want to read.
So these are my short lists. I've tried to highlight items that cross the boundaries between what people traditionally expect are topics for school and topics for the real world. I'd be curious to hear what other people's experience with topics like these are.</div>
When I first heard Rob Pike and Ken Thompson had created a new programming language my first instinct is, "I'm sure it'll be cool, these are brilliant guys!" Unfortunately that is not the case, mostly because I think they made some bad design choices, and because the "cool new things" aren't actually that new, or cool.
The first major mistake was using a C derived syntax: braces and semicolons. I know some people don't like significant whitespace, but if you aren't properly indenting your code it's already beyond the point of hopelessness. The parser ought to use the existing structure instead of adding more punctuation. Readability is one of the most important attributes of code, and this syntax detracts from that.
The next mistake is having a separate declaration and assignment operator. I understand that the point of this operator is to reduce the repetition of typing out the types name both in declaring the variable and in initializing the value. Yet it seems the purpose of the := operator is to avoid typos in variable names that are possible by making all assignment an implicit declaration if the variable wasn't already declared. I can see myself making many more typos by forgetting to use the := operator, and in cases where I make a typo in a variable name it would inevitably be caught by the compiler when I attempted to actually use it (the fact that this is an attempted declaration means the variable won't have been declared elsewhere).
The final mistake was not providing generics. C++'s templates is one of the things that make the language head and shoulders more useful for me than C for tasks I need to preform; generics allow one to provide reusable data structures. While Go seems to have something akin to generics with their map data structure, it disappointingly doesn't appear to be exposed in any way to user code. One of the things I've found makes me most productive in Python is that any time I need to perform a task I simply pick the data structure that does what I want and it is efficiently implemented. Without generics, I don't see a way for a statically typed language to offer this same breadth of data structures without each of them being a special case.
In terms of features which I believe are overhyped the most important one is the "goroutine". As best I can tell these are an implementation of fibers. Constructs like concurrency should not be granted their own syntax, especially when they can be cleanly implemented using the other constructs of a language, look at the C library libtask as an example.
Further, the handling of interfaces, though interesting, appears to be an implementation of C++0x's proposed concepts (which won't be in the final standard). I view this feature as something that is most useful in the context of generics, which Go doesn't have. The reason for this is to be able to make compile time assertions about the types that are being templated over. Doing anything else is more clearly implemented as abstract inheritance.
I'm not writing off Go permanently, but I'm not enthused about it, someone will probably have to let me know when I need to look again as I won't be following along. Enjoy your internet travels.
You can find the rest here. There are view comments.
I've now set up Al to be using GMP for all integers, and I'll be doing the same for floats once they get implemented. I haven't started benchmarking yet, but it can compile and calculate the factorial of 50000 pretty quickly, and in Python vanilla that would result in a RuntimeError due to a stack overflow, so it's a good starting point. Sorry for such a short post, I'm pretty tired today.
You can find the rest here. There are view comments.
Over the past few weeks I've been working on compiling my programming language. At present it works by translating the source into C++, and then you compile that with your compiler of choice. It's garbage collected, using the excellent Boehm GC library. At present it can only compile a limited subset of what it can actually parse, or what the interpreter supports. As of today thought it can compile and run a factorial function, however it can't calculate any factorial greater than 12, due to integer overflow issues. To solve this I'm either going to use GMP or roll my own Bignum library, and I'm not sure which yet. On the whole though, progress is good. The generated C++ is about as good as it could be considering the limitations inherent in turning an interpreted language into a compiled one. I haven't started benchmarking it yet, that was originally going to be the point of today's post before I ran into the integer overflow issues, however this is an example of the C++ code that is generated.
Give this Al(also valid Python):
def fact(n):
if n == 1 or n == 0:
return 1
return n * fact(n-1)
print(fact(1))
print(fact(12))
It generated the following C++:
#include "src/base.h"
AlObj *fact;
class f0:public AlFunction
{
public:
virtual AlObj * operator () (ARG_TYPE args, KWARG_TYPE kwargs)
{
AlObj *n = args.back ();
args.pop_back ();
if (*
((*((*(n)) == (AlObj *) (new AlInt (1))))
|| (*(n)) == (AlObj *) (new AlInt (0))))
{
return (AlObj *) (new AlInt (1));;
}
ARG_TYPE t0;
t0.push_back ((*(n)) - (AlObj *) (new AlInt (1)));
return (*(n)) * (*fact) (t0, KWARG_TYPE ());
}
};
int
main ()
{
fact = new f0 ();
ARG_TYPE t1;
ARG_TYPE t2;
t2.push_back ((AlObj *) (new AlInt (1)));
t1.push_back ((*fact) (t2, KWARG_TYPE ()));
(*print) (t1, KWARG_TYPE ());
ARG_TYPE t3;
ARG_TYPE t4;
t4.push_back ((AlObj *) (new AlInt (12)));
t3.push_back ((*fact) (t4, KWARG_TYPE ()));
(*print) (t3, KWARG_TYPE ());
}
All said and done, I'm pretty impressed! You can get all the code here, all the compilation work is in the code-generation branch.
You can find the rest here. There are view comments.