Help Wanted!
The Unicon project is looking for help on the following topics as of
2/25/2021. Thanks to Hugh Sasse for improving this page.
Most of these topics are requests from the user community. Many of these would
make excellent independent study or thesis topics. Net volunteers are also
welcome. Anyone willing to pay for any of these projects should drop me a
note, and I will hire appropriate students to do it at their bargain wage
rates.
An asterisk (*) in the title indicates that Somebody is believed
to be working on that topic, or has implemented a feature not yet
adopted in the Unicon baseline. (S) after the title indicates that
this project would be particularly suitable for a student volunteer.
Contents
The Unicon translator (unicon
) is written in Unicon and has
evolved over time.
Purpose: core language. Skills needed: Unicon, compilers.
- Common Optimization (S) --
Although they could be done in either the
icont
or
iconc
back-end compilers written in C, the Unicon
translator is an appropriate place to implement a
number of classic optimizations such as constant folding
and common subexpression elimination. Also needed is
strength reduction, not only for classic arithmetic operations
but for string processing; for example,
changing upto('s')
to find("s")
.
Also, there are obvious conversions to avoid at compile time, such as
changing write(1)
to write("1")
and the like.
- Case Expression optimizations --
The Unicon VM implementation needs a more efficient implementation of
case expressions in the common cases in which case branches are known
at compile time to be integer or string constants. We should devise a
working optimization for integer labels, akin to C switch statements.
Then we should devise a string label case optimization, possibly using
hash numbers and the integer optimization. Compare with Java's
tableswitch and lookupswitch instructions; we might want something
similar to those. Should hashing the value we are looking for be implicit
in those instructions? Or a separate instruction? Good question.
Unicon source distributions include Icon's optimizing compiler iconc,
which is built separately ("make Uniconc") and accessed using "unicon -C"
on Unix-based systems with an available C compiler.
Purpose: faster execution. Skill needed: C expert.
- Dead Code Elimination (S) --
Iconc may need to remove unreferenced procedures and classes
prior to type inferencing in order to speed up compilation and reduce memory
requirements.
- Compiler Optimization (S) --
Iconc can use further optimization of its generated code.
Much of the Unicon virtual machine was written in C in the early 1980's. The
VM runtime system was altered in the early 1990's to use an extended-C
syntax called RTL (runtime language). This allowed it to be used for both
Iconx and Iconc.
Skills needed: C expert.
- VM optimization -- The virtual
machine translator (icont), and the interpreter and runtime system
(iconx) can be further enhanced for better performance. For example,
the memory allocations performed during most string subscripts can be
avoided with a relatively simple addition of a new virtual machine
instruction. A recent undergraduate research assistant added the new
virtual machine instruction; what is needed now is a modification to
icont to make use of it.
As another example, in many or most calling contexts,
the translator can identify when a generator cannot be resumed. If this
information were passed into the invocation, a suspend might be promoted
into a (much faster) return expression.
- VM translator type inferencing --
The type inferencing mechanism used by iconc has been sped up to the
point where type inferencing could be used to direct VM optimizations,
not just C-compiles.
- Compact table representation (S) --
If a program needs millions or billions of small tables, the
memory overhead of such tables becomes important. If HSegs is 20, one
would seemingly be able to store tables of size <= 5 in a single Table
block instead of allocating an array of buckets.
- Dynamic Interpreter Stack --
Short of infinite recursion,
it should be almost impossible to cause an interpreter stack overflow.
Perhaps Icon's own list data type could be used to implement a
dynamic interpreter stack. Alternatively, checking and
realloc
'ing the interpreter stack might be made to work.
- VM dynamic code-- The dynamic loading facility built-in to
Unicon needs to be supplemented and extended with dynamic linking to
allow new code to be generated and executed on the fly within an
existing program execution context.
- Portable bytecode-- It would be nice if Unicon executables
could be delivered in a machine neutral format, similar to the Java VM.
- cset keyword conversions (S) -- are keywords such as
&lcase
converted to strings often enough to warrant special-cases in the cset
conversion code?
- Avoid one-char allocations (S) -- Many functions such as map()
would not need to allocate a string from the heap, if that string were
of length 1, they could just return a pointer to that character in
static memory.
- Improve large integer performance (S) --
Recent benchmark testing has suggested that Unicon might gain substantially
in performance by switching to use a popular multi-precision arithmetic
library such as GMP in place of the current large integer implementation.
- Improve large integer string conversion (S) --
A large integer such as 5^4^3^2 takes a long time to convert to a string,
like 4+ minutes on an older amd64! This could be made much faster
by reimplementing large integers using GMP (above) or altering the
representation to be base-10-compatible on a per-largeint-chunk basis.
- Improve random number generator* (S) --
Unicon should consider changing its random number
generator to use a Mersenne twister. What are the pros and cons?
A first step would be to acquire or develop an implementation that
is suitable.
- Unicode Support* (S) --
Unicon does not have character type; it has strings which are currently
represented internally as a sequence of 8-bit characters. Unicon needs
Unicode support. We have a design document describing some of the data
structures and steps needed to add support for Unicode. Skills needed:
C expert; character encoding experience a plus.
*Status: work is being done on UTF-8 support, and help would be welcome.
- system() Cleanup (S) --
Unicon's system() function has proven to be a portability challenge, where
over time feature creep and platform-specific extensions have led to
multiple implementations in different locations. This should all be in one
place, and the code should be cleaned up.
- operator overloading (S) --
Unicon's experimental operator overloading facilities are experiencing
increased demand, but before they can become part of the language proper,
they need a thorough test suite, and their portability on various platforms
needs to be demonstrated. Some bugs were reported on 32-bit platforms.
We have gone to considerable effort (like, a Ph.D. dissertation) to enable
the authoring of advanced tools for Unicon in Unicon.
Our debugging facilities need to be extended to be able
to handle newer language features.
Skills needed: Unicon expert.
- Unicon Debugger Enhancements --
The monitoring facilities described in the book
"Program Monitoring
and Visualization". have been used to produce an extensible source level
debugger, udb.
This debugger can use further refinement.
For example, it needs to support co-expressions, threads and patterns,
and needs better support for classes and objects.
- Unicon Profiler (S) --
A good profiler would tell time and space information about Unicon program
executions, including runtime system time and and space, not just source
code modules' time and space. Line-level, and built-in level details are
needed. *Status: a simple profiler prototype named uprof was developed by
a student as a semester project. It is useful enough that it has made it
into the language distribution, but needs further refinement.
&time
has been made more precise on Linux using high resolution
timers, but perhaps a different API is need to report separately on
user time and system time.
- Unicon Lint --
A "lint"
for Unicon would detect bugs and probable bugs
by static analysis. For example, redundant/repeated type conversions.
Fonts are an important aspect of widening Unicon's suitability to more
applications. They are at present the single biggest obstacle to portability
across platforms.
Skills needed: C expert.
- Unicon Freetype* (S) --
Unicon should add support for the
Freetype
font engine and provide a set of
portable fonts that match, pixel-for-pixel, on all window systems.
*Status: basic freetype support was added to 3D facilities. 3D text
support needs further testing and development.
Freetype support is in Gigi Young's OpenGL-based 2D facilities.
It could be extended to work with the legacy 2D graphics (X11 or
MSWin or both), and needs further development.
- Unicon Native Fonts--
It would be nice if Unicon could add new fonts dynamically, in order
to support interesting languages that are not well supported by
operating systems.
- Unicon Deadkeys (S) --
The iconx X11 client code should be updated to use X11R5+ support for
locales and "dead keys" to compose accent characters using XmbLookupString
and/or the LC_CTYPE stuff.
Skills needed: Unicon expert and/or C expert.
- Class Variables
- Some additional syntax is needed to make it more convenient to declare
variables who are shared among all instances of a class; possibly a "class
static" storage class. Currently you
can achieve this effect using globals and packages, and method static
variables are shared among instances, but a more direct syntax would
be handy.
- Instance Variable Initializers
- Someone added a "local" syntax for class instance variables, as an
alternative to the syntax based on record declarations. But, that
local syntax apparently does not support initializers correctly.
- Private and Read-Only-Publics
- Unicon's predecessor Idol had private semantics and a public keyword.
Private semantics were dropped because they added to complexity and
space consumption without adding functionality. But arguably they
have value and should be an option. While a distinction between
private and protected does not seem very useful in Unicon, a scope
that would be really useful would be a read-only public designation,
to avoid the need for many accessor methods.
Iconx needs to be extended to support directly executing .icn source files.
Also, support for "one-liners" where the source code is supplied as a
command line option. Icon 9.5 added some support for this on UNIX;
for Unicon we need a multiplatform solution if possible.
Better programming tools are always in demand. An interactive interpreter,
or an incremental compilation system, would make an excellent project.
There are several ways to execute new unicon on the fly that was typed in
interactively:
- using
system()
- slow, doesn't pass non-string parameters easily
- using
load()
- but
load()
does not "link" into the current program, and currently
does not support calling procedures in another program directly, one would
have to use a co-expression to change control to the other "program" and then
call a desired procedure via some wrapper code. Also,
load()
'ing a lot may
have garbage collection issues that haven't been discovered yet.
- Undergrad-level project: develop a "library" model for Unicon modules,
calling them through a co-expression interface using wrapper procedures
- develop a new mechanism for linking and loading COMPILED Unicon code
as a .so/.dll per loadfunc()
- developing a pure interpreter
- for strings or syntax trees constructed from a parse of the code.
As an experiment, I wrote a little program that reads lines from the user,
and for each one, calls an eval(s) function that writes it to a file,
compiles it, uses load(), and activates it. This is "slow", but runs in
well under a second, it is not obvious that we have to discard unicon/icont
and go with some pure interpreter in order to provide this type of service
on modern machines. Handling stored procedures and globals in such an
interpretive environment requires more thought, but still seems doable, and
would be useful to experimenters and new users.
Udaykumar Batchu performed a project to simplify the calling of
C functions from within the runtime system, improving on the traditional
Icon loadfunc() dynamic loading utility. His work needs some refinement,
and student Vincent Ho suggested an "inline C" capability that would fit
in nicely. It would
be interesting to add such a capability to the compiler and
to the interpreter.
Skills needed: Unicon and C expert.
It would be nice if Unicon programs could call Python. It would be nice if
Python programs could call Unicon. It would be nice if we figured out ways
to make this fairly easy to do.
Skills needed: Unicon/C/Python expert.
It has been requested that we make the interpreter embeddable within
C/C++ applications. Developing a standard mechanism for turning the
Unicon VM into a callable C library would make an interesting project.
Skills needed: C expert.
Unicon has a test suite that covers many of the language
features. However, the test suite can be improved in two ways: first,
extending the tests to cover more features, especially the new ones that are
under-represented. Second and more importantly, automating multi-platform
testing. With light weight virtualization technologies such as LXC, Unicon
binaries can be quickly built and tested on a variety of platforms with a
mix of 32/64 bit and x86/arm guests to ensure no build problems or bugs.
Skills needed: Unicon Language, scripting and virtualization.
The graphics facilities would benefit from multiplatform printing support,
including the generation of postscript or pdf. The database facilities
would benefit from a report generator similar to crystal reports.
Skills needed: C expert.
The messaging facilities done by Steve Lumos support popular protocols such
as HTTP and POP. One thing we need to do is port these from UNIX to Win32.
Another thing we need to do is add protocols. We have simple SSL support,
but it may need enhancement. A critical extension for e-mail support is
SMTP AUTH, the authenticated
version of the SMTP protocol. We also need FTP, IMAP, NNTP, ...
Single-platform enhancements are uninteresting to users on other platforms,
but occasionally they are necessary or useful in making Unicon suitable for
applications that it otherwise would not be used in.
Skills needed: C expert.
There are currently 11 Windows-native functions in the Windows versions
of Unicon, implementing buttons, scrollbars, menubars, edit regions, and
various dialogs. A larger set-of Windows native GUI capabilities might
allow applications to look more "native" on Windows and be usable by
screen readers.
One of the requested Windows-specific features is COM support.
The technical questions are: (a) is a platform
independent interface possible (to support CORBA or javabeans as well,
for example, and (b) how high-level can we make this API?
Porting iconx to be an Active Script Engine (at one time documented in the
"Visual Programmer" column from Microsoft Systems Journal online) would
allow Icon to be an embedded scripting language for many Windows
applications.
Skills needed: C expert.
Additional means of automating the transmission of structured or
binary data would be valuable to Unicon -- Google's Protocol Buffers
are an example.
Skills needed: C expert.
New platforms of particular interest are smartphones and tablets.
There was at one time a preliminary WinCE port including much of
the 2D graphics facilities. It might or might not be of any use
for a modern Windows phone port. It needs extensions in several areas,
such as networking, and a strategy for adapting existing Unicon GUIs
to the small screen (touch support, enhanced screen size portability).
*Status: a current Java-based Unicon implementation project may be a
starting point for a new effort here.
Skills needed: Java proficient.
Our sister site, junicon.sourceforge.net, is the host for a new
Java-based implementation of a Unicon-like language called Junicon.
The Junicon translator actually translates to either Java or Groovy,
and can target either compiled bytecode or interactive interpretation.
What Junicon needs in order to be publically released is a runtime
system; specifically, a large number of built-in functions, including
I/O facilities, to enable it to run typical Icon and Unicon programs
unmodified. *Status: many built-in functions have been developed by
Junicon's author, and more were implemented by an undergraduate
research assistant. Subsequently, a UIdaho senior design project
developed a large number of the unimplemented features for Junicon.
This work needs to be merged into the main Junicon repository.
Another effort to develop a Java+libGDX implementation of Unicon's
graphics facilities probably needs to be revisited in light of the
Unicon OpenGL 2D graphics implementation.
Unicon should handle common archive and compressed archive
formats such as .zip as easily as it does other file types.
Skills needed: C intermediate.
It would be useful to add I/O modes in which arbitrary structure
values (tables, objects, etc) could be written to and read from disk,
making something like encode()/decode() a built-in.
Skills needed: C intermediate.
Unicon's structure types are all mutable, making them next to useless as
hash keys. Adding a "freeze" bit, a promise from then on that a structure
would never be modified, would enable them to hash on contents instead of
on serial number, and might enable various optimizations.
Skills needed: C intermediate.
Skills needed: C expert. Graphics API expert.
- DirectX port (S). Such a port would enable Unicon to run on
Windows-based platforms that do not support OpenGL well (Vista? Xbox?).
A recent effort to develop a DX11 port resulted in a large C++ library
that would entail an integration effort in order to put into Unicon.
*Status: an M.S. thesis resulted in a C++ library that might be a
starting point for a Direct X11 port. This work was built atop MSVC
which is no longer a supported compiler for Unicon. It is unclear
whether Mingw64 fully supports DirectX and vice versa, since
Microsoft typically uses proprietary API's to preserve their
hegemony.
- When a window (possibly 2d, offscreen) is used as a Texture on a
3D object, it should be updated with current contents every time
the 3D object is redrawn. :-)
- Unicon should support PNG as a standard graphics format.
*This has been implemented but seems to behave incorrectly on some images.
- Subwindows (at least) should support a borderwidth attribute,
and have the option of having no border. Perhaps main windows too.
- Mac - We need a Macintosh programmer,
proficient in (or willing to learn) native Mac graphics API's,
to complete a Mac port of the graphics facilities. At one time,
prototyping for
this effort showed that a Cocoa GUI thread creating and calling a VM
thread was the right way to organize this execution model.
It is unclear what API to use for a long-term Mac port at this point.
The rise of multi-core CPU's made it inevitable that Unicon should be
extended to support parallel computation, so it was. The interesting
questions now are whether it can be extended to support implicit parallelism.
Skills needed: C expert
- Concurrency in
unicon -C
* --
A UIdaho senior project resulted in preliminary support for
Concurrent features under the optimizing compiler. Status: it
works on many programs, but needs further testing and debugging.
- DataParallel Operators* --
Unicon should support (deep) structure-at-a-time operators, such as
L1+L2 producing a list L3 with elements of L1 pairwise-summed with L2.
*Status: experimental modifications to support element-wise addition
is in the runtime system under the #ifdef symbol DataParallel.
- Parallel Generators --
Traditional functional languages can automatically parallelize calls to
mathematically pure functions with no side effects, e.g. when they occur
in loops. Potentially we could do that, but we might do better.
Unicon should acquire enough static analysis to identify generators that
(a) can safely be executed in parallel threads, and (b) are computationally
substantial enough that parallelism is worth doing. Perhaps there is a (c)
which is that the code that uses the results needs to be substantive
enough to justify the parallel thread.
It would be neat if Unicon supported persistent structures, structures
that survive across program executions. An approximation of this can
be accomplished by storing xencoded structures in GDBM files, but it
would be nice if it were easier and more direct.
Skills needed: C expert
The error messages, particularly from the runtime system, can be enhanced to
improve readability and help the programmer have a clue of how to fix the
problem encountered. Long error tracebacks should be written to a file and
a terser summary printed to standard error output. The default diagnostics
style should be friendlier to new Unicon programmers. It might be possible
to load/attach udb when a runtime error occurs.
Skills needed: C intermediate. Status: all of this is about finished,
except the "load udb when a runtime error happens" part.
Icont's parser needs to be modified to work with any YACC
implementation. At present it fails on some 64-bit Linuxes
if -O2 is turned on, apparently due an issue in the old AT&T YACC
parser skeleton.
The unicon IDE and our interface builder IVIB need to to be joined together.
Level 0 of this would be: the two binaries built into one program that
switches back and forth with no coordination, communicating only through
files, and equivalent to running the programs independently. Level 1 would
add file save checks and automatically loading the selected file when
switching. Level 2 might reach a point where no file I/O is needed in order
to switch back and forth. Compare with Visual Studio or Netbeans.
Skills needed: Unicon intermediate.
Unicon's benchmark suite, described in UTR16, does not yet benchmark all
important language features, but does reveal a lot about Unicon's performance.
*Status: The public is invited to improve the benchmark implementations,
report performance on varying platforms, and propose benchmarks that would
improve the suite. Additional concurrency benchmarks have been requested.
Traditionally Unicon binaries were only available on MSWindows -- other
folks had to build from source code. Some Linux and Mac binaries have been
developed in recent years, including Debian and MacOS Sierra binaries,
but we need volunteers to regularly build, test,
and submit Unicon binary distributions on their preferred non-Windows
platforms, including Mac, Ubuntu, Fedora, etc.
The Unicon project is in need of one or more evangelists to see it as their
mission to spread the use of the language wherever it is beneficial to do so.