Sunday, March 22, 2015

And now, for something completely different, and inconsequential


Taking a break from composite structure management inside my language, I wrote a little Python script to generate k-ominoes. I was first introduced to the 5-tile version, "pentominoes". If I recall correctly, somebody manufactured a playkit/puzzle of a set of all(?) of the pentominoes, and one of the challenges was to fit them all into the box.

Playing around with tetrominoes will be familiar to people who have played Tetris before.

One vaguely tongue-in-cheek idea I have for a game is "k-tris", where you get to select the value of k, that is, the number of tiles making up the pieces that you'll be dropping. For k=4, the game is Tetris, and might earn me a cease and desist, so perhaps I only allow odd values for k. k=3 becomes a pretty trivial game, and even moreso for lower values. Not that it's my job to police that. Feel free to play one-block-Tetris if you want. My guess is that "Pentris" (k=5) or "Septris" (k=7) will be really hard to play. But there's no reason it'd be any harder to program than the more familiar versions.

Sunday, March 15, 2015

Storing into aggregate types

I've been working on updating the grammar to handle aggregate types (structs, classes, that sort of thing). This weekend, I've adapted some of the C grammar and got things running again.




The LLVM code you see up there uses the "insertvalue" instruction, which is the way to write into a struct. The paired instruction is "extractvalue", which the compiler doesn't yet generate.

There are a few other issues with the generated code - the correct offset isn't being used. I think the "1" at the end of each line is an index to the element within the aggregate structure. I'm computing an offset for x and y members of my 2d vectors, but it's not piped through properly.

And then there's the loading and reloading of each of the aggregate instances. I know that LLVM has some aggressive optimization, but I'd rather not rely on it to clean this sort of thing up.

Friday, March 13, 2015

K&R 2e


I don't have any progress to report on the development of the language - I'm currently trying to get structs working, and that's involved some pretty substantial breakage of the grammar. I had a grammar rule for assignments which was something like <variablename> = <expression>, which was sufficient for a while - I could evaluate functions and bind them to variables, I could have mathematical expressions, I could take the contents of variables, and assign them, things were rosy.

But then I tried "velocity.x = 3", and my structure didn't hold up anymore. I had already created the velocity struct, that part of the work was pretty much working (see my earlier posts about migrating to llvmlite in order to get aggregate types being generated). I even knew that the "x" member of the velocity structure was the zeroeth element, because I was keeping that information around.

I just ran into a stumbling block in how to represent a struct member as a "L-value", a thing that can be assigned to.

I looked at some online references for the C grammar, and found an ANSI C grammar which sent me down a rabbit hole of rewriting my expression productions. I looked at either pycparser or maybe pycparser (honestly, probably the first one) and I see a bunch of ways that my code can be cleaner, but I'm still not parsing struct assignment correctly.

I stumbled across a LALR tutorial which makes everything seem simple.

And, I guess, for simple languages, simple parsers are simple. But things get hairy quick.


C is, in a lot of ways, a simple language. I learned it in maybe a week, probably much less than that. I bought myself the K&R "white book" (pictured above) around Christmas of 1989. Hm, I'm not sure all of the chronology of this story makes sense, because I'm pretty sure I bought myself an IBM PS/2 286 in January of 1990, and I taught myself C on that 286 with the clackety keyboard and the VGA monitor and the 20 MB hard drive.

C is an easy language to dive into, and it's pretty easy to understand how things work. I still smile at the characterization of C as having the power of assembly language with the readability of assembly language. Looking at C and assembly today, it's easy for me to read either one of them, but the difference is the level of abstraction - there's more "idioms per line" in C than there is in assembly.

And then, there's C++, which I taught myself bits of in 1992 and 1993. Again, the structure of the language is there to afford more idioms per line than C. But I never found C++ quite as satisfying as C was, and over time, I managed to ship plenty of working C++ code, and pass several programming interviews, and I knew about the "diamond of death" multiple inheritance issues, and I knew about vtables, and it was fine, but I kept looking for something else.


On the flip side, when you look at the grammar of C, it's hard to call it "clean". There are simpler languages out there (LISP and PostScript come to mind, as well as a few proprietary languages that I've had to use professionally). In each of these environments, there's a balance between the complexity of the language tools (parser, compiler, interpreter, etc) and the productivity of the people using these tools. At one company I worked at, an engineer insisted that a scripting language was important to the project, but he couldn't convince management to allocate time for it, so a simple PostScript(ish) interpreter was coded up over a weekend, and that engineer was happy. Management was happy, because people were still on-schedule, and then junior engineers were given the task of creating a coding culture in this stack-based language.

Toward the end of that project, I was approached to see if I wanted to add "regular expressions" onto the language, which didn't make any sense to me. I'm pretty sure it didn't make any sense to the person asking me, either. Perhaps he meant infix operators, which would have been a substantial change to the language.


All of this to say, I still have fondness for C. I dug around in the house and found my 25 year old copy of K&R, and I'll be using that as a reference as I do some more hacking with the grammar of my language this weekend. Hopefully, I'll move closer to that sweet spot of clean language design balanced against clean language implementation.

Sunday, March 8, 2015

Test Driven Compiler Development


Some more (hard to screenshot) progress, some more features that turn up to be annoying in real development.

A while ago, I was trying to get aggregate types (e.g. structs, classes, etc) into my language so that I could represent the position of the ball in my pong game as a single variable. I had trouble with getting LLVMPY to assign to or read from members of an instance of an aggregate type.

I dug into the mailing list for LLVMPY and discovered that LLVMPY was no longer maintained, and the cool new thing is llvmlite, which is a thinner wrapper around LLVM. Ok, so that's what I need to take advantage of the full LLVM feature set, that's fine.

So, I set aside the work on aggregated types and began rewriting the compiler to use llvmlite. I'm using git, so this was as easy as creating a new branch on my local machine.

This being the recommended upgrade path, there is a LLVMPY emulation layer, which you can presumably just import instead of the old library. I figured switching all at once seemed the right thing to do, so at first I ignored the migration layer. Much of the migration was straightforward, but I bumped into some bits for instance of managing global variables, and looking at the migration layer was a really good reference to see how the migration might be implemented.


As I was making my changes, I compiled several of my earlier test cases, verifying that the compilation completed, and then eyeballing the output. A better approach would be to embed expected output into the tests themselves and then automatically verify the built output against the expected output.

One step in that direction that I did implement is a simple python script that verifies that all the binaries build without compilation error, which is what's in the screenshot here:


You can see "built 17/17". Can't do much better than that.

Along the way, I updated some of the pong code to start using floats for position and velocity, but then I started getting a lot of errors where I was comparing a floating point number to an integer. My language doesn't allow this, and it has no automatic type conversion. Indeed, it has no conversion at all, so pong wasn't compiling, and I had to wade through a bunch of these lines.

Worse than changing numerous lines of code was the difficulty of finding the lines where the problems were. Several of the errors reported a line number, but they were invariably way too high, which resulted from doing two passes over the source code.

I fixed that, and now my error messages are more useful, and all of my tests compile and run.


I proceeded to check in my llvmlite branch, which now needs to be merged with my work on aggregate types.

Saturday, March 7, 2015

It prints 7

I've installed llvmlite, the successor to llvmpy, which is itself the successor to something else.

I've installed the prerequisites, run the tests, and begun poking around with the sample code and the docume... no. The nice thing about llvmlite is that it's a really thin layer around the LLVM library, so the LLVM documentation is going to be my reference.

I proceeded to hack one of the test scripts to output LLVM code to match one of my earlier tests.


and, as advertised, it prints 7:

Wednesday, March 4, 2015

Or, maybe I could do it that way.

So, my language recognizes struct declarations. And allocations. And I've been working on assigning to instances of these structs.

I had false starts that I've mentioned here before, specifically about creating anonymous structs, which are probably useful for a language feature I want to do later, but not really relevant right now.

And so, I've been looking for examples of how it's supposed to be done. I found vague references to "getelementptr" in LLVM, which in LLVMPY is exposed as "gep", because hey, it's hard to type stuff.

Interesting side trip: my language is pretty close to C, so I wrote a little bit of C code demonstrating the behavior I want in my language, and I used "clang" to compile it to LLVM.

clang -S -emit-llvm teststruct.c 

Easy peasy. Dropped a bunch of LLVM asm into teststruct.s, which used that getelementptr thing that I was trying to use. So, maybe I'm going in the right direction.

More searching, more experimentation (they call it Computer Science because we use the scientific method) and I start exploring the LLVMPY issue tracker, and I discover a feature request to support structures. Ok, good, it's a priority... ooooh. No. That's been outstanding for going on two years now.

And I start poking around the LLVMPY users' mailing list, and I discover a post suggesting that LLVMPY users switch over to LLVMlite.


So, I guess that's the next thing on my TODO list - migrate over to LLVMlite. And maybe, you know, read the documentation.

Tuesday, March 3, 2015

On my way to implementing classes



I've been working on adding classes to my language, and I've got very little to show for it so far. Previous modifications have been relatively small, adding a little bit to the grammar, handling one new production, a small test case, and post here.

The test case I'm working on is this:


I added class definitions to my language, and that seems to be working OK, and I seem to be able to declare the instances of my classes just fine, but assigning to the members (e.g. "scrpos.x = 160;") has been giving me a hard time up to this point.

The place where I was stuck was in handling the "MemberRef" production (what happens when the parser processes the "." member operator). I have an object that I know is a struct, and I know it's got two int elements, and I know I want to refer to the "x" element, but I was banging my head, trying to figure out how to get the "Vec2i" name for that object.

Turns out, when I was constructing my struct object originally, I had ignored an optional parameter that is the struct's name. Now I think I have what I need.


On the left, in black-on-white, you'll see llvm.core.Type.struct(types, self.classname). This is where I'm passing in "Vec2i" in as the name of the struct I'm constructing. On the right, in white-on-black, you'll see my debugging spew, where it is reporting that the "base type name", the name of the type of scrpos in my sample at the top, that's "Vec2i", where I need to be able to figure out what "x" means.

A couple lessons here:

1) (boring) llvm.core.Type.struct can take more than one parameter, just because the code runs doesn't mean that it's doing what you want.

2) LLVMPY's documentation can be sparse. However, in this case, I could have discovered my problem by rereading the documentation on the struct constructor. By passing in no name, earlier, I was implicitly asking for an "opaque" struct.

3) source control is your friend. I don't think I need to tell you that, but I've ripped out a lot of stuff, rebuilding stuff, and the only way I can feel confident to do this is to know that I've got a checked in "working" version to revert to if things go wrong. Indeed, I've added a bunch of stuff that's now unnecessary, and I can filter out the unuseful stuff before I commit it.