Wednesday, September 23, 2015

Building tools for the 65816 Übersquirrel

In which our protagonist proves how amazingly productive procrastinators can be, unless color choices are involved.

Remember last time we talked about switching from the 6502 to the 65816, and that we'd have to write our own assembler and emulator? Well, it turns out that there is a reason that the world is not overflowing with emulators for a 8/16- bit hybrid chip with a banked memory structure and emulation/native modes: It's complicated, to the point where we're already dealing with questions the experts describe as "esoteric" and have to test on actual hardware to answer. That is never, ever a good sign.

So my chips are gathering dust in a box while I'm writing the emulator from hell.

Forth code from the crude65816. In which other language can you get away with naming a function "24>lsb/msb/bank" ?

But wait, how do you test an emulator? You need an assembler, because hand-assembling test routines will drive you insane. So we set the emulator aside and upgrade our Typist's Assembler for the 65C02 for the 65816. As a nice side effect, we can replace some of the most confusing mnemonics (PEA becomes PHE.#), and there is none of this "LDA <$123456" or "LDA !$32" stuff.

A simple testing routine for the emulator in Typist's Assembler. Loop names get silly when the hours get long.

But wait, normal Forth syntax highlighting looks like crap in vim for assembler. So we set the assembler aside and write a vim syntax plugin for Typist's Assembler.

Code from the vim plugin for Typist's Assembler. I am not willing to discuss how much time I spent deciding which colors to use.

Back to the emulator? Not so fast, because we really should have a test suite for it. Because this is Forth, it is simply an add-on (accessed by INCLUDE CRUDETEST.FS after loading the emulator). Because I'm easily amused, we display the result of the test run in a matrix.

Early test run of the matrix of results from the test suite for the emulator. Note the complete lack of spoons.

After all of that, how far along is the emulator? In pre-ALPHA with maybe a third of the opcodes working. Since I went for the simple instructions first (duh), it's only going to get harder from here on out. In other words, this is going to take a while, quite possibly till the end of the year.

Still, we can dream. In the next entry, we'll get back to the overreaching design plans for the Übersquirrel Mark I and that bad Mass Effect joke I promised.

Friday, June 12, 2015

Hooking up with big sister: Switching to the 65816

In which our protagonist whines about getting sidetracked by ancient board games and dead Romans and then makes excuses for abandoning the very processor he claimed he loved so much.

I should probably explain why these entries are few and far between. Apart from the demands and joys of job, family, and house, life in the 21st century turns out to be really, really distracting. In the past weeks, I've rediscovered the game Go -- much more easy to learn with YouTube lectures than with books back in the 80s -- and lost far too much sleep reading The Swerve: How the World Became Modern by Stephen Greenblatt in two late-night sittings. All this means less time for the Übersquirrel.

However, one thing I decided very early with this project is that it is going to be long-term, escapist in the best sense, and not another contributor to my to-do-list. So, it's going to progress in fits and starts.

My latest fit is switching to a different processor.

Let's not pretend this isn't a biggie. The Übersquirrel started out because of my love of the 6502, and abandoning it seems cruel and unfair. However, Tali Forth has shown me that though you can do an amazing amount with a small processor, 64 KByte of memory are simply not enough. A few megabytes, now that should be fine for any system without a GUI.

Fortunately, we can have our cake and eat it, too, regardless of what GlaDOS says. The 6502 has a big sister, the 65816 8/16-bit hybrid CPU. Historically, it only saw brief use, because computer builders quickly switched to full 16- or 32-bit designs such as the 68000 that powered my beloved Atari ST. But while those chips have fallen by the wayside from an hobbyist's point of view, the 65816 is still in production -- and the last CPU standing wins.

The 65816, which really doesn't look much different than the 65c02. Photo by Anthony King, placed in the public domain

The main advantage of the 65816 from our point of view is its address space of 24 bits, giving us 16 MByte of RAM and/or ROM. Consider the memory problem solved. Transition from the eight bit world is easy because its big sister starts off in "emulation" mode which makes it pretend it is a 6502 (with minor differences). In the beginning, we can use our old software. Flipping a bit turns it into a 16 bit machine. That should make lots of things easier.

However, this is also where the problems start. As a hybrid processor, the 65816 is more complicated than a simple 16 bit CPU. For instance, you set the the accumulator (A) and index registers (X and Y) to be eight or 16 bits independently of each other. This means a simple command such as TAX -- transfer contents of the accumulator to the X register -- has four different cases:

A is 16 bit, X is 16 bit
A is 16 bit, X is 8 bit
A is 8 bit, X is 16 bit
A is 8 bit, X is 8 bit

The command itself is always the same: TAX. Remembering what width A, X, and Y have at any given time is one of the things that make programming the 65816 harder. Also, that nice 16 MByte memory range is split up into 256 64 KByte segments. On a more practical level, there is no free emulator for Linux available and fewer assemblers that for the 6502. We might have to write both ourselves.

Our breadboarded 6502 machine was the Übersquirrel Mark Zero, so this new project will be the Mark I. In the next entry, I'll be presenting the rough design of this computer, including a slightly radical step regarding the voltage, and a new horrible pun based on the Mass Effect universe.

Tuesday, March 31, 2015

They grow up so quickly: Tali Forth now in BETA

In which our protagonist claims a milestone for Tali and then sends the poor woman out into the big, evil world with nary a drone to defend herself.

At some point you just have to draw a line and get the thing out the door: I have designated Tali Forth, my small, simple Forth for the 65C02, as feature-complete, and so it is now officially in BETA.

This doesn't mean that it is even close to being finished, of course. Large parts haven't been tested beyond "it compiles", and don't even think of talking to me about optimization. But we've fooled around with software so long that it is time to get back to hardware. It's good enough for now.

[BETA actually began a few weeks ago, but I was off with the Inquisition in Thedas. Long story, involves dragons.]

So much has changed with the hardware by now that we're going to have to discuss everything in separate post. Until then, interested parties might want to read up on the SPI interface and the 65816 CPU. Megabytes, here we come!

Tuesday, January 6, 2015

My very own 65c02 assembler in Forth

In which our protagonist insists on writing his own assembler in a quirky language for a 30-year-old CPU and then makes everything worse by inventing his own syntax.

One of the things Forth traditionally excels at is assembler programming -- see Brad Rodriguez' BYO Assembler in Forth for more details on this. Since at some point I'm probably going to switch from the 65c02 to the more powerful 65816, which will likely involve writing my own assembler, I decided to give it a try.

The result is A Typist's 65c02 Assembler in Forth [GitHub], so named because I used the occasion to create a different syntax that doesn't have those annoying uppercase symbols like "$" and "(". In other words, it's more fun for ten-finger typists. See the introduction on for details. I've just started dogfooding larger programs, so it might be a while before that sort of code shows up here.

(If anybody is interested in assemblers as such, and why else would you be here, take a look at the free PDF version of Assemblers
and Loaders
by David Salomon.)

In the next entry, we will at least try to get back to our regular scheduled programming.

Wednesday, November 12, 2014

The Strange Case of the Branching Dictionary

In which our protagonist tries to convince us that the weird structure of Tali Forth's dictionary is actually an amazingly clever idea.

So the weather in Germany has finally turned horrible again, which means that I can return to indoor pursuits. Ask me again why Autumn is my favorite time of year.

Anyway. Many months ago, we left off with the promise that we'd be examining the dictionary of Tali Forth. This contains the Forth "words" that make up the language, things like DROP or LOOP.

The basic structure is simple: A linked list of entries. When we define a new word, it is added to the dictionary. When we type a word, the dictionary is searched from newest to oldest entries until it is found. Remember the flow chart in the previous entry: If a word is not found, Forth checks to see if it is a number. If that fails, it complains.

If we take a closer look at a single dictionary entry, we see it consists of two parts: A "head" and a "body". The body is the actual "payload" of code that is executed when the word is executed. For example, the DROP instruction in Tali Forth is are simply the 65c02 instructions INX INX plus a RTS at the end. The header provides stuff like the link to the next entry in the list, the string of the word ("DROP"), the length of that string, and some flags.

There are various ways to construct a Forth dictionary entry (Brad Rodriguez has an excellent explanation of what the different parts do, including examples from other Forth versions). The problems are always the same: We need a pointer to the start of the dictionary, so we know where to find the information in the header, and we need a pointer to the code in the body to execute the word the entry represents.

Of course, it would be really cool if those pointers were one and the same, right? Heh. So here is how Tali Forth solves the problem:

Ascii drawing of a the Tali Forth dictionary entry, version ALPHA 004. Including as an image for those of who insist on using fancy-pants variable space fonts instead of monospaced.

We'll get to the first two bytes -- the BRA stuff -- in a moment. The byte after that includes the flags and five bits for the length of the name string. Then, we have a pointer to the next entry in the dictionary, followed by the pointer to the end of our body. This is used for compiling short words directly into machine code. The next to fields are of variable length: The the word as an Ascii string, and the payload of code in the body itself.

Now, notice we do something really ... different: Every entry starts with a BRA instruction. This is only available on the 65c02, not the original 6502, and tells the CPU to jump ahead (or back). Here, we have it set up so it jumps to the beginning of the body.

What this means is that we can execute any Tali Forth word in the dictionary by simply jumping to its first byte. Which just so happens to be where the link from the previous entry points us to. This way, we only have to deal with one 16-bit address and don't have to read or calculate where the body begins. In Forth terms, this is our Execution Token (XT). Pretty cool, right? Well, except that we pay a hefty penalty. Every single dictionary entry is two bytes longer and takes up to four cycles longer.

In the end, I decided to go ahead with the setup because it is simpler than all the others I could think of. Remember, simplicity is a primary goal.

(What I'm more worried about is if we actually need the pointer to the end of the body, or if we can't figure out another way of figuring out where to stop compilation.)

In the source code, an entry looks like this:

Tali Forth source code for the DROP entry in the dictionary. Remember, if this looks different to you than other assembler code, it is because we use the Orphis Assembler 2.0

By the time of the next entry, we'll probably have cleaned up the code enough to have it posted on GitHub like a real project. As long as it keeps raining in Germany, that is.

Saturday, May 17, 2014

The Dragon-Free Internals of Forth

In which our protagonist is amazed that the core structure of Forth is so simple even he can understand it.

Forth was not invented by a bunch of eggheads at a university, but by a guy who needed to get real work done. Chuck Moore started programming in the 50's at the Smithsonian Astrophysical Observatory with code that computed ephemerides, orbital elements and satellite station positions. After graduate school at Stanford, he worked as freelance programmer, using a bunch of routines and concepts he had developed on the way over and over again:

Working in Fortran, Algol, Jovial, PL/I and various assemblers, he continued to use his interpreter as much as possible, literally carrying around his card deck and recoding it as necessary.

This "interpreter" would evolve into the language now called Forth. Moore rewrote it for various computers systems, and even later, hardware remained a moving target:

Throughout the 70's, as he implemented Forth on 18 different CPUs (...), he invariably wrote for each his own assembler, his own disk and terminal drivers, even his own multiply and divide subroutines (on machines that required them, as many did).

Keeping stuff simple not only was a philosophy, but a necessity.

So while the internals of boffin languages can be so complex and scary they make people think of dragons, the core of Forth is so simple it seems almost snarky:

The flow chart of a Forth system. From Lost at C? Forth might be the answer by Tom Napier and Eric Krieg.

The only non-trivial concept here are the "immediate" words: As the name says, those are executed right way -- "immediately" -- even if we're in compile mode. LITERAL for instance is a compile-only, immediate word that makes sure the word will later push a number on the stack.

So "compiling" in Forth means adding new words -- what other languages call "functions" -- to the dictionary. In its simplest form, this does not involve math, complex algorithms, or dragons (though of course we have nothing against dragons, because dragons are awesome). In other words, it is something even we can do.

The major question then becomes how to design the dictionary. In the next entry, we'll look at that more closely, because Tali Forth does something weird.

Tuesday, April 15, 2014

Real Jedi make their own Forth

In which our protagonist takes matters into his own hands, needlessly quotes Chaucer, and gets his Star Wars mythology mixed up.

It turns out that there is a problem with the "standard" Forth for the 65c02, FIG Forth -- it is old, as in, really old. Trying to learn modern Forth with it is like trying to learn modern English by studying The Canterbury Tales: Not impossible, but probably not a good idea.

But now is tyme to yow for to telle
How that we baren us that ilke nyght,
Whan we were in that hostelrie alyght;
And after wol I telle of our viage
And al the remenaunt of oure pilgrimage.

Strangely, nobody seems to be working on a modern version of a lesser-used programming language for a 30-year-old CPU. So I decided to write my own. Hence the headline.

And thus was born Tali Forth for the 65c02, which has now reached a late ALPHA stage (and is free to use, but at your own risk). The basics are there except for some more complicated words such as DOES> and ?DO, and some legacy words such as, yes, WORD. IF/ELSE/THEN and DO/LOOP work, and so does POSTPONE, which is one of those words that is really, really powerful, but will give you nightmares trying to understand it the first time.

(About the name: I always liked it, and it seems we're not going to have more kids. If it sounds familiar, you're probably thinking of Tali'Zorah vas Normandy, a secondary character in the Mass Effect universe. For the record, and EA's lawyers, this software has absolutely nothing to do with the game or the companies that made it.)

Screenshot of Tali Forth running on the py65mon emulator. Multiline formatting still sucks, and though DOES> is listed, it is currently just an empty dictionary entry. I should probably define KEELAH somehow.

I decided on four criteria for this implementation (see here for more detail):

Simple. Duh -- if I'm writing it, it will have to be simple. For this reason, I chose to use Subroutine Threaded Code (STC), where each word that is not native machine code is accessed by a JSR. Brad Rodriguez of CamelForth fame explains the differences between the various models nicely.

Specific. For the 65c02 only, and as close to the metal as possible for speed. Well, and because I really like to program assembler.

Standardized. Roughly following the ANSI Forth standard. Actually, I used the ANS Forth 200x draft document. Ahead of the curve!

Speedy. When forced to choose between speed and size, go for speed (within reason). It would still be nice to stay under 8 kbyte for the core routines so they fit in the ROM chip that starts at E000.

Current high-level commands. These will probably be joined by ?DO and +LOOP once I figure out how to code them. Long-term, I'll consider replacing the individual strings with a long one so you can just add further commands to the end and EVALUATE them without all this hassle.

So. What has writing 230 kbyte of (grotesquely over-commented) source code by hand for 5.9 kbyte (and counting) of assembled machine code taught me?

First, it is amazing how simple Forth really is under the hood. Once the basic interpreter loop is up and running, you just code word after word, and compiling stuff is a breeze. No wonder "Forthwrights" (a term said to be invented by Al Kreever) look down on the complexity of other languages. Understanding POSTPONE is the problem, not coding it.

Second, I have given up trying to read anything but the most trivial Forth code without drawing stack diagrams. As true as the previous statement is, a lot of the power of Forth is in the implicit rules -- the way the stack works, for instance -- which takes extra effort to understand. This fits with the reports that the inventor of Forth, Chuck Moore, prepares stuff before coding:

His programming style is thoughtful. He thinks about the problem a lot and writes very little code. He thinks it through again and rewrites the code. He objects to letting too much code accumulate from historic reasons and likes to rewrite. He is the most productive programmer I have ever seen yet he has only written about 15K of code in the last fifteen years.

Which could lead us to a discussion about the definition of "productivity", and the time pressure modern-day programmers -- the article is from 1998 -- are under. But we'll skip that here.

In the end, the combination of power and complexity means the Jedi joke isn't quite right: It should probably read "Sith" in the title. I have a definite feeling of having joined the Dark Side, where you can do amazing stuff with very little effort, but pay a definite price. So far, no cookies.

In the next entry, we'll discuss design details of Tali Forth. Some of them are, well, curious.