Stack-based programming with a human face

I Think many of you have found on the Internet articles and books about stack-based programming and the Forth language. First wave of enthusiasm: as all is simple, logical, clear and powerful! And why did these ideas have such a small spread? Why so few programmers actually use languages like Fort? After some time comes a wave of frustration: Yes, interesting idea, but hard to read source code, how dreary is working with variables, strings and fractional numbers! Interesting toy, useful group of bitolia, not more.

image

Often this is where it ends. But personally, I could not reconcile with the idea that graceful concatenative programming will remain in the shadow of other ideas. Yes, the difficulty with reading the source code. Yes, the syndrome of one line. Yes, every time to understand the algorithm you have in the imagination of the broadcast program and imagine the stack, reading the source code. But are those really flaws that is inherent to stack-based languages stack, and without which programming would no longer be a stack? Is there no way to at least mitigate these shortcomings and make life easier for programmers? It is possible and necessary!

Problem the first: the syndrome one of the strings


The phrase "a syndrome of one line" I first found the Barron in the "Introduction to programming languages". Although the term is not widespread, it speaks so eloquently to many languages.

The syndrome of a single line characteristic of those languages that allow for arbitrary structure of the program source code and have a short, and sometimes even single character keywords. Programmer is trying to "cram" into a single line as much as possible key words, which programs are not very readable. Especially clearly this is manifested in the syndrome of APL and its descendants, and Brainfuck, of course, in the Fort. Look again at the illustration at the beginning of the post and count how many on one line in the middle of the words.

But this is a solvable problem. In Forte syndrome one of the strings came through the preferences Moore, who loves short and meaningful English words. Unfortunately, these words quickly come to an end, and suffixes and prefixes (i.e. namespace on the knee) Moore does not like. So now the whole world is enjoying the simple characters in the spirit of "@! C@ C! /MOD CR \S P" is stuck in one line. The elementary problem is solved by choosing better words. Why write:
the
 FOR-EXAMPLE OVER OVER + ROT ROT - * ;

If I may:
the
define for exemple (a b -- {a+b}*{a-b})
over over
(a b -- a b a b)
+
(a b -- a b a+b=c)
rot rot
(a b c -- c a b)
- *
end

And even better:
the
define for exemple [a b -- r]
[a b - > a b a b]
+
[a b c (=a+b) -> c a b]
- /
end

But on this record will be discussed below.

Problem two: the code inside


Another complication of the management structure inside out. They are certainly delicate from the point of view of the idea and implementation, but such code is hard to read:

the
: sign-test ( n -- )
dup 0 < [
drop the "negative"
] [
zero? [ "zero" ] [ "positive" ] if
] if print ;

In this example, the code blocks and all the basic definition of the word at a glance. But is a little to complicate how the program will seem unreadable to the author in a week after writing.

To help hasten the usual if and while statements of imperative programming:

the
define sign-test [x]
[x -> x x]
0 > if
"Greater than zero"
else
"Less than or equal to zero"
end-ifprint-string
end

Maybe they are not as powerful and expandable, but very clear and practical. But if you want to create something like the arithmetic if of old Fortran, the code-block mechanism on top of the stack can be included in language as an additional element and use it not everywhere, but only where it is really needed and justified. Fortunately, you can eat such a mechanism is not asking, and implementation is not too complicated.
As for the Fort, then it is another problem: all the same words. Moore didn't want to add complex structures same final words like END, so IF, DO and other words must close their unique words. That's why we see all these IF ELSE THEN that introduce even the experienced programmers in a dead end. If we do not want to complicate the parser and the mechanism of words, why not enter words like END-IF? Here affecting Moore's dislike for suffixes and prefixes. But, like the first issue, this point is also easily solved and is not specific disadvantage of stack-based languages.

Problem three: interpretation in mind


In a number of program features on Forte and other stack-based languages are hard to read and understand their essence. The thing is that every time when reading a code block, which introduces a new word, we have to interpret in the imagination of the program and at each step to imagine what elements in which order are on the stack and how functions work with them. Needless to say, that it is sometimes very tiring and unproductive. But the most unpleasant is that, unlike the previous features that are historically so poorly formed, the need for such interpretation is the eternal companion stack-based programming.

Of course, to get rid of this shortcoming is impossible, but there are ways that can significantly facilitate the work of the programmer reading the source code. And most importantly, the first steps have already been taken. Indeed, the Fort took the programmers some notation for ease of reading source code:

the
( before-after )

These comments facilitate the provision of numbers on the stack, their number and order. No need to strain your imagination to understand how many numbers and in what sequence need to be placed on the stack before calling the function and how many numbers on the stack will remain in the calculations. But here's the mystery: if such comments are very clear and useful, why forth programmers write them only at the beginning of the definition of the word, and even then not always? What religion is stopping after a heap dup drop swap rot to write such an explanatory comment?

Back to the second code example. Of course, this is the case in a vacuum, but in real complex programs, such comments to the site:

the
define for exemple (a b -- {a+b}/{a-b})
over over
(a b -- a b a b)
+
(a b -- a b a+b=c)
rot rot
(a b c -- c a b)
- /
end

The programmer need not every time after all these swap, rot and + to simulate in my head the order of the numbers in the stack. Just need to run through the eyes. But here's the new problem:

the
(a b -- a b a b)

And
the
over over

similar. Just the first record made in a declarative style, and the second in the imperative. That is, the lines in the program is constantly duplicated. With the same success it is possible to speak about convenience of the assembler: left to write code with a bunch of mov and ret, and the right to review a=a+1, and then to talk about the ease of reading. Well, reading the comments. But can you contrive to write so that they are in any programming language will be read very easily and clearly. This, of course, does not mean that such languages convenient. They can be anything bad from the point of view of reading source code.

There is a natural desire to merge (a b — a b a b) over and over. Indeed, if we write comments in a particular notation, they can parse the compiler, and manually add the desired words permutations of the elements of the stack. Comments in parentheses are completely ignored and are intended only for the person. Comments in square brackets needs of both people and the compiler. The translator analyzes these comments in which a man writes what he wants, not how to come to such a result. That is, with a stack you can work in a declarative style.
Consider the basic, that is to say, the variety of such expressions:
1. define foo [b a r] or [b a r — f o o]
After the name of the new function in the square brackets you need to specify the number of arguments that need to be on top of the stack. If when calling any function that requires three arguments, the stack will have only two arguments, the traditional Fort system will give an error: stack is empty. But with such comments, the translator will be able to "see" review and determine the names of missing arguments: yeah, there is not enough argument r in the function call foo. Agree, this is much more informative than a vegetable "stack is empty". Of course, all this applies only to interactive mode, when the source code is available. After compiling these comments will disappear, they are only used for debugging and reading source code. For comparison, the error in gforth:

the
: add-x-y ( x y -- x+y ) compiled
+ ; ok
4 5 add-x-y . 9 ok
4 add-x-y . 
:6: Stack underflow
4 >>>add-x-y<<< .
Backtrace:
$7FF17383C310 + 
ok

2. [a b - > a]
The following types of expressions are different from purely descriptive word ->, which is similar to word in a classical review. If the number of elements left larger than the number of elements on the right, the translator comes to the conclusion that some items should be discarded, they are no longer needed. If the item is on top of the stack, such design is converted into a drop, but if it is not, the compiler first performs a permutation of the elements of the stack to the trash ended up on top of the stack, and then applied drop. Needless to say, as this notation makes life easier for the programmer in that case, if you want to delete, say, two elements from the middle of the stack. Let the compiler decide how to do it better, the programmer only describes what he needs.
3. [a b -> b a]
Such expressions denote the permutation of elements: number of elements left and right of the word -> same and the elements are the same. It remains only to choose the optimal scheme of permutations. Let it makes the car, people already so arranged that a long chain of swap drop over rot injected them into a stupor.
4. [a b - > a a b b]
The number of elements left is less than the number of elements on the right. This suggests that some elements need to duplicate. But what these elements are and in what order they should be placed, decides to let the translator. Man to think in terms of the swap dup rot dup uncomfortable.
5. [a b - > a b]
Nothing has changed, purely descriptive design. An example of application is given below.

Naturally, in such declarative expressions you can use regular comments:

the
dup rot +
[a b - > a b (a + b)]


Describe some words of Fort in a new form:
the
define dup [x]
[x -> x x]
end

define drop [x]
[x -> ]
end

define swap [x y]
[x y> y x]
end

define over [x y z]
[x y - > x y x]
end

define rot [x y z]
[x y z> y z x]
end


fourth Problem: floating-point numbers


Imagine a stack consisting of 32-bit cells that store integers. All this stack was good and the speed of arithmetic operations, and ease of operation with addresses of variables. But if we have to multiply 3.14 by 4? Where to put the decimal number? If you organize the stack as a set of 64-bit cells that store a floating-point number and store integers as decimal with a zero fractional part, then immediately evaporate such advantages as the speed of arithmetic operations.

We introduce a second stack for floating point numbers. Cell it will be, say, 64-bit, but they will be less. All numbers without a dot are placed on vertices of an integer (integer) stack, and all the numbers with a dot (albeit with a zero fractional part) are stored in fractional (float) stack. That is, we can multiply the number referred to in the following way:

the
4.0 3.14 *f

where *f multiplies a fractional number (similar to a +f, -f and so on).

We introduce a new declarative expression:
[stack: a b c -> stack: a b c]
Which takes elements from one stack and places them in another:
[integer: z1 z2 -> float: z1 z2] 2 times print-float end-times
Stack the stack is the number of 4.0 and 5.0. The reverse operation "cuts off" the fractional part.
Define new words:
the
define integer->float [x]
[integer: x -> float: x]
end

define float->integer [x]
[float: x -> intger: x]
end

Similarly, with a stack of returns (return stack).

The post turned out pretty voluminous and sometimes controversial, so much of the material will be in the second part. Again, ideas and criticism from the discussion, will adjust the terms of writing.
Article based on information from habrahabr.ru

Comments

Popular posts from this blog

Powershell and Cyrillic in the console (updated)

Active/Passive PostgreSQL Cluster, using Pacemaker, Corosync

Experience with the GPS logger Holux M-241. Working from under Windows, Mac OS X, Linux