For portable imperative low-level programming, my favourite programming language ist still C. Basically because there is no real alternative to it so far – there may be projects aiming it, but a big advantage of C is simply the huge codebase and portability, mostly without having to do anything at all. For example, you can even rely on most POSIX-Functions even under windows, if you dont mind using the Cywgin-DLLs, etc. On the other hand – of course – C is not good for hacking. When writing software in C, one should have a clear perception of what he wants to code and how he wants to structure his code.
But I dont want to discuss this by now. C has its advantages and disadvantages, there is no doubt about that. What I want to point out right now is, that C has a few disadvantages, of which I actually do not understand why they are still existing (or whether they are still existing at all). I would like to point out a few of them and explain why I dont understand that they are existing.
Probably I will be wrong with some assumptions, since I have never actually viewed a code of a complex C compiler. So, if I am wrong, feel free to comment this post.
Turing-Complete Macro System
The first thing that C lacks of is a turing-complete macro-system. The template-system from C++ is – as far as I heard – turing complete meanwhile, but its comparably hard to actually code with it. I tried to understand it, but to me this system seems really ugly for actually programming something with it.
C has the #define statement, and most C-Compilers support inline functions. I myself try to use inline functions whenever possible, because they are easier to debug and seem to be a more modern approach. On the other hand, in some cases, they are simply not sufficient. Luckily, #define statements can even have arguments. But there is no recursion, they are not turing-complete, and there is no possibility (I know of) to modify the contents of the arguments before putting them in the code.
Ok, so you could just put some kind of recursive calls for #define statements, and maybe some basic string-operations in your C-Compiler. Then one could for example implement a simple JSON-Parser, etc., to handle that code and have something that is comparably mighty. Would be one solution which should not be too hard to implement, improves usability of the compiler, and doesnt make the actual code which is generated slower, since anything that happens happens at compile-time. This possibility would be better than what we have right now, but I wouldnt really consider it being a good alternative to the macro-systems I know from Lisp.
A better possibility would be to actually use code-generating functions written directly in C. You could then pass them the arguments either as strings or in some kind of syntax-tree – syntax-tree would be better since in most cases you would use some parser anyway, so why not using the one the compiler has. One would have to find a good format of these syntax trees, there are a few possibilities for that, but that part shouldnt be too hard. This function then returns either a string or a syntax tree which is then placed into the code, where the macro was called.
This is the possibility I would actually prefer. But of course, it is harder to implement. One main difference between C and most functional languages is that in C, compilation time and runtime are strictly separated, so you cannot just call functions you have already compiled while still compiling (which you actually can do in most lisp dialects). Of course, one could consider this a misfeature of C compilers, and in fact, it should be possible to have the same thing in C, too, but on the other hand, many things are harder to optimize then, and since C is very low level, this could produce a lot of problems when it comes to memory management like managing syntax trees, buffer overflows, etc.
Maybe an alternative would be some statement – lets call it #macroinclude – where you can include a file which explicitly must be runnable (i.e. in most cases compiled) before the actual file can be compiled, because it relies on the macros defined in that file. Then, the compiler could take that file, compile it (maybe with lesser optimizations, etc.), and use it for compiling the actual file. To prevent the compiler from flaws, it could fork and let only one subprocess actually call these functions, while the other subprocess asks it for macroexpansion whenever needed.
Multi-Line Strings
I dont see any reason why it is not possible to declare a string that has newline-symbols with proper newline-symbols inside the code, i.e. declaring a string in multiple lines inside the code, as it can be done in Scheme and Common Lisp for example. It shouldnt be too much harder to parse (in fact, it should be easier I think).
Compile-Time Type-Dispatching, Operator Overloading
Actually, C++ already does have these features in some cases. But C apparently doesnt. And actually, of course, I see simple reasons for that – functions in shared libraries must have a pre-defined unique name. On the other hand, one could easily define a simple naming-scheme for such functions. The reason why one may want this is that it is simply extremely useful and can enhance code-readability. Lets say for example you want to define an additional numeric type, say Countable Ordinal (I explicitly dont use Complex Numbers here, because there are C-Compilers supporting them, as far as I heard), and you want to define an addition for them. You would have to add a new function name like „addOrdinals“. But it also makes sense to add integers to ordinals, since they are a special case of ordinals. So you would also have to add a function „addOrdinalToInteger“. With Compile-Time Type-Dispatching, you would be able to just define a function „add“ twice, with different types. Now, with operator overloading, you could also add an additional definition for +, which can also enhance readability. Important: All of this can (and should) be done at compile-time.
User-Defined Infix Operators
Even though I like Lisp, in languages like C which are not optimized for coding with prefix-ops, additional infixes can be a nice thing to have. Many languages (haskell, sml, afaik also perl 6) have them, so it shouldnt be too hard to add them to C.
Structure Introspection
C has no type information on runtime. And thats a good thing, I suppose. Its a design decision whether or not to have type information on runtime, and since C is low-level and you want to be able to do such „bad“ things like casting integers into strings to send them through sockets, etc. – with a turing-complete macro-system you could of course add something like type-information to some structures, if you want it, but C is a low-level-language, and it should stay that. If I really need a complex object system, I would maybe switch to GLib if I have to use C, or use Common Lisp.
Anyway, structure introspection is something that could really be easily provided, at least on compile-time, for example inside macros, but I actually dont see why not also on runtime. For any structure A, you could – for example – add a function void* readFieldA(A* object, char* fieldname) which returns a pointer to the beginning of the given field for the given object. This is a simple function that could be easily implemented. If you already have Compile-Time-Dispatching as mentioned above, you wouldnt even have to use an own function name for any type. Additionally, maybe one would like to have a function like size_t fieldLength (A* object, char* fieldname) to know the length of the field, and functions int fieldNum(A* object) and void getDeclaredFieldNames(A* object, char** names) to get all names of declared fields.
Well, this is one possibility I could think of. One of many. A problem with this possibility is when it comes to tings like unions or bitshifts. For this, maybe it would be a good thing to let readField return an appropriate function pointer which you can pass the given object to extract the desired data, instead of a pointer onto the data itself.
Templates
Templates are a nice thing to have. As far as I know, they are already planned to add to a new C standard. And well, why not?
Lambda-Forms, Closures
As well clang as gcc as some new C++-Drafts are trying to define a lambda-syntax and lambda-forms for C/C++. In fact, it shouldnt be too hard to get something like that, at the moment, I like clang’s block-syntax most, but its still very complicated to use imho. Whats the problem with a syntax like just lambda (arglist) { <code> ; return <...>; }. The type can be inferred by looking at the return-statements. Lambda-Forms are not to use for huge functions, that is, normally they should stay small enough such that not only the compiler but also the code writer should be able to determine its type. And what about the ugly function pointer syntax als clang’s block syntax has? Why cant there just be type-declarations like (int,int->int) or some similar syntax? In the end, its just syntax, for the parser it shouldnt be too much additional work, for the programmer, it should become easier to code.
With lambda-forms comes – of course – the problem with closures. Small objects like integers can be directly copied into the code of the lambda-form (or some additional data structure naming the initial stack frame). But sometimes, you have pointers to work with, pointers to stack objects which may be already deleted when calling this function. That is a problem of which I dont know a proper solution (except when changing the memory management – which is something you usually dont want when working with C). Well, you will – as with all structures you define inside C – have to take care that your pointers stay valid yourself. I guess thats a price you have to pay for low-level-programming. In many cases, the compiler should be able to detect if a pointer references onto the stack or onto something else, so it could warn the programmer.
Tail Recursion, Loop-Recur
GCC also supports tail call optimization, and in my oppinnion, this should be the default, since I see no reason why one wouldnt want to have it in most cases – and in the few cases when it would make sense, maybe one could add a keyword or compiler-flag to suppress it.
In addition, it would be nice to have some loop-recur-statement as scheme and clojure do have, like loop (arglist) { <code>; recur args; <code>; return ...;}. I think it would be convenient to have two possibilities, either recur (arguments) as a function call (for the cases you dont have tail recursion – which may sometimes occur), or recur arguments as a replacement for return recur(arguments), but thats a matter of taste. And of course, you could do the same thing (except for non-tail-calls) using a while-form – but thats not the point.
Namespaces
I never quite understood why there are no namespaces in C. Namespaces have – in my opinion – nothing to do with the fact that the language is low-level. Namespaces are useful for structuring code, and they are something that can be handled during compile-time. The main problem would be which naming-scheme to use in libraries, but since most systems have common naming-schemes for that, since as far as I know in most systems there exist C++-Compilers and mostly different C++-Compilers are trying to act similar to each other, I dont see any problem with that.
Another possibility would be to just define prefixes of names which can be omitted when declaring it. In the end, thats about what namespaces are. Thats not that beautiful, but who cares, its simple.
Conclusion
So these are a few simple language extension requests by me. I think most of them would not break the spirit of C but would be extremely useful. Discuss!