closos/Documentation/chap-intro.tex

324 lines
15 KiB
TeX
Raw Normal View History

2013-10-28 08:10:20 +01:00
\chapter{Introduction}
\pagenumbering{arabic}
\section{What a Lisp operating system is}
A Lisp Operating System (LispOS for short) is not just another
operating system that happens to be written in Lisp (although that
2013-10-28 09:16:36 +01:00
would be a good thing in itself). For the purpose of this document, a
LispOS is also an operating system that uses the Lisp interactive
environment as an inspiration for the interface between the user and
the system, and between applications and the system.
2013-10-28 08:10:20 +01:00
In this document, we give some ideas on what a LispOS might contain,
how it would be different from existing operating systems, and how
such a system might be created.
\section{Problems with existing systems}
\subsection{The concept of a \emph{process}}
2013-10-28 09:16:36 +01:00
Most popular existing operating systems are derived from \unix{} which
was written in the 1970s. The computers for which \unix{} was intended
had a very small address space; too small for most usable end-user
applications. To solve this problem, the creators of \unix{} used the
2013-10-28 08:10:20 +01:00
concept of a \emph{process}. A large application was written so
that it consisted of several smaller programs, each of which ran in
its own address space. These smaller programs would communicate by
having one application write text to its output stream for another
application to read. This method of communication was called
a \emph{pipe} and a sequence of small applications was called
a \emph{pipeline}. As a typical example of a chain of applications,
consider the pipeline for producing a typeset document (one of the
2013-10-28 09:16:36 +01:00
main applications for which \unix{} was designed). This chain had a
2013-10-28 08:10:20 +01:00
program for creating tables (called \texttt{tbl}), a program for
generating pictures (called \texttt{pic}), a program for generating
equations (called \texttt{eqn}), and of course the typesetting program
itself (called \texttt{troff}).
2013-10-28 09:16:36 +01:00
Using \unix{}-style pipes to communicate between different components of
an application has several disadvantages:
2013-10-28 08:10:20 +01:00
\begin{itemize}
\item To communicate complex data structures (such as trees or
graphs), they must be converted to a stream of bytes by the
creating component, and it must be analyzed and parsed into an
equivalent data structure by the using component. Not only is
this unparsing/parsing inefficient in terms of computing
resources, but it is also problematic from a
software-engineering point of view, because the external format
must be specified and maintained as a separate aspect of each
component.
\item An artificial \emph{order} between the different components is
imposed, so that components can not work as libraries that other
components can use in any order. Sometimes (as in the example
of the \texttt{troff} chain) the end result of a computation
depends in subtle ways on the order between the components of
the chain. Introducing a new component may require other
components to be modified.
\end{itemize}
2013-10-28 09:16:36 +01:00
Pipes also have some advantages though. In particular, they provide a
\emph{synchronization} mechanism between programs, making very easy to
implement producer/consumer control structures.
2013-10-28 08:10:20 +01:00
It is an interesting observation that in most text books on
operating systems, the concept of a process is presented as playing
a central role in operating-system design, whereas it ought to be
presented as an unfortunate necessity due to the limited address
space of existing minicomputers in the 1970s. It is also presented
as \emph{the} method for obtaining some kind of \emph{security},
preventing one application from intentionally or accidentally
modifying the data of some other application. In reality, there are
several ways of obtaining such security, and separate address spaces
should be considered to be a method with too many disadvantages.
Nowadays, computers have addresses that are 64 bit wide, making it
2013-10-28 09:16:36 +01:00
possible to address almost 20 exabytes of data. To get an idea of the
order of magnitude of such a number, consider that a fairly large disc
that can hold a terabyte of data. Then each byte of 20 million such
discs can be directly addressed by the processor. We can thus
consider the problem of too small an address space to be solved. The
design of \sysname{} takes advantage of this large address space to
find better solutions to the problems that processes were intended to
solve.
2013-10-28 08:10:20 +01:00
\subsection{Hierarchical file systems}
Existing operating system come with a \emph{hierarchical file
system}. There are two significant problems,
namely \emph{hierarchical} and \emph{file}.
The \emph{ hierarchy} is also a concept that dates back to the
1970s, and it was considered a vast improvement on flat file
systems. However, as some authors%
\footnote{See
\texttt{http://www.shirky.com/writings/ontology\_overrated.html}}
explain, most things are not naturally hierarchical. A hierarchical
organization imposes an artificial order between names. Whether a
document is called \texttt{Lisp/Programs/2013/stuff},
\texttt{Programs/Lisp/2013/stuff}, or something else like
\texttt{2013/Programs/Lisp/stuff}, is usually not important.
The problem with a \emph{file} is that it is only a sequence of
2013-10-28 09:16:36 +01:00
bytes with no structure. This lack of structure fits the \unix{} pipe
2013-10-28 08:10:20 +01:00
model very well, because intermediate steps between individual
software components can be saved to a file without changing the
result. But it also means that in order for complex data structures
to be stored in the file system, they have to be transformed into a
sequence of bytes. And whenever such a structure needs to be
modified by some application, it must again be parsed and
transformed into an in-memory structure.
\subsection{Distinction between primary and secondary memory}
Current system (at least for desktop computers) make a very clear
distinction between primary and secondary memory. Not only are the
two not the same, but they also have totally different semantics:
\begin{itemize}
\item Primary memory is \emph{volatile}. When power is turned off,
whatever was in primary memory is lost.
\item Secondary memory is \emph{permanent}. Stored data will not
disappear when power is turned off.
\end{itemize}
This distinction coupled with the semantics of the two memories
creates a permanent conundrum for the user of most applications, in
that if current application data is \emph{not} saved, then it will
be lost in case of power loss, and if it \emph{is} saved, then
previously saved data is forever lost.
Techniques were developed as early in the 1960s for presenting
primary and secondary memory as a single abstraction to the user.
2013-10-28 09:16:36 +01:00
For example, the \multics{} system had a single hierarchy of fixed-size
2013-10-28 08:10:20 +01:00
byte arrays (called segments) that served as permanent storage, but
that could also be treated as any in-memory array by applications.
2013-10-28 09:16:36 +01:00
As operating systems derived from \unix{} became widespread, these
2013-10-28 08:10:20 +01:00
techniques were largely forgotten.
\section{Objectives for a Lisp operating system}
The three main objectives of a Lisp operating system correspond to
solutions to the two main problems with exiting systems as indicated
in the previous section.
\subsection{Single address space}
Instead of each application having its own address space, we propose
that all applications share a single large address space. This way,
applications can share data simply by passing pointers around,
because a pointer is globally valid, unlike pointers in current
operating systems.
Clearly, if there is a single address space shared by all
2013-10-28 09:16:36 +01:00
applications, there needs to be a different mechanism to ensure
\emph{protection} between them so that one application can not
intentionally or accidentally destroy the data of another application.
Many high-level programming languages (in particular \lisp{}, but
others as well) propose a solution to this problem by simply not
allowing users to execute arbitrary machine code. Instead, they allow
only code that has been produced from the high-level notation of the
language and which excludes arbitrary pointer arithmetic so that the
application can only address its own data. This technique is
sometimes called "trusted compiler".
2013-10-28 08:10:20 +01:00
It might sometimes be desirable to write an application in a
low-level language like C or even assembler, or it might be
necessary to run applications that have been written for other
systems. Such applications could co-exist with the normal ones, but
they would have to work in their own address space as with current
operating systems, and with the same difficulties of communicating
with other applications.
2013-10-28 09:16:36 +01:00
\subsection{Object store based on attributes}
2013-10-28 08:10:20 +01:00
Instead of a hierarchical file system, we propose an \emph{object
store} which can contain any objects. If a file (i.e. a
sequence of bytes) is desired, it would be stored as an array of
bytes.
Instead of organizing the objects into a hierarchy, objects in the
store can optionally be associated with an arbitrary number
2013-10-28 09:16:36 +01:00
of \emph{attributes}. These attributes are \emph{key/value} pairs, such as for
2013-10-28 08:10:20 +01:00
example the date of creation of the archive entry, the creator (a
user) of the archive entry, and the \emph{access permissions} for
2013-10-28 09:16:36 +01:00
the entry. Notice that attributes are not properties of the objects
2013-10-28 08:10:20 +01:00
themselves, but only of the archive entry that allows an object to
2013-10-28 09:16:36 +01:00
be accessed. Some attributes might be derived from the contents of the
2013-10-28 08:10:20 +01:00
object being stored such as the \emph{sender} or the \emph{date} of
an email message. It should be possible to accomplish most searches
of the store without accessing the objects themselves, but only the
2013-10-28 09:16:36 +01:00
attributes. Occasionally, contents must be accessed such as when a raw
2013-10-28 08:10:20 +01:00
search of the contents of a text is wanted.
For a more detailed description of the object store, see
\refChap{chap-object-store}.
It is sometimes desirable to group related objects together as
with \emph{directories} of current operating systems. Should a user
want such a group, it would simply be another object (say instances
of the class \texttt{directory}) in the store. Users who can not
adapt to a non-hierarchical organization can even store such
directories as one of the objects inside another directory.
When (a pointer to) an object is returned to a user as a result of a
search of the object store, it is actually similar to what is called
a "capability" in the operating-system literature. Such a
capability is essentially only a pointer with a few bits indicating
what \emph{access rights} the user has to the objects. Each creator
may interpret the contents of those bits as he or she likes, but
typically they would be used to restrict access, so that for
instance executing a \emph{reader} method is allowed, but executing
a \emph{writer} method is not.
\subsection{Single memory abstraction}
Instead of two different memory abstractions (primary and
secondary), the Lisp operating system would contain a single
abstraction which looks like any interactive Lisp system, except
that data is permanent.
Since data is permanent, application writers are encouraged to
provide a sophisticated \emph{undo} facility.
The physical main (semiconductor) memory of the computer simply acts
as a \emph{cache} for the disk(s), so that the address of an object
uniquely determines where on the disk it is stored. The cache is
managed as an ordinary \emph{virtual memory} with existing
algorithms.
\subsection{Other features}
\subsubsection{Crash proof (maybe)}
There is extensive work on crash-proof systems, be it operating
systems or data base systems. In our opinion, this work is
confusing in that the objective is not clearly stated.
Sometimes the objective is stated as the desire that no data be lost
when power is lost. But the solution to that problem already exists
in every laptop computer; it simply provides a \emph{battery} that
allow the system to continue to work, or to be \emph{shut down} in a
controlled way.
Other times, the objective is stated as a protection against
defective software, so that data is stored at regular intervals
(checkpointing) perhaps combined with a \emph{transaction log} so
that the state of the system immediately before a crash can always
be recovered. But it is very hard to protect oneself against
defective software. There can be defects in the checkpointing code
or in the code for logging transactions, and there can be defects in
the underlying file system. We believe that it is a better use of
developer time to find and eliminate defects than to aim for a
recovery as a result of existing defects.
\subsubsection{Multiple simultaneous environments}
2013-10-28 09:16:36 +01:00
To allow for a user to add methods to standard generic functions (such
as \texttt{print-object}) without interfering with other users, we
suggest that each user gets a different \emph{global environment}.
The environment maps \emph{names} to \emph{objects} such as functions,
classes, types, packages, and more. Immutable objects (such as the
\texttt{common-lisp} package) can exist in several different
environments simultaneously, but other objects (such as the generic
function \texttt{print-object}) would be different in different
environments.
2013-10-28 08:10:20 +01:00
Multiple environments would also provide more safety for users in
that if a user inadvertently removes some system feature, then it
can be recovered from a default environment, and in the worst case a
fresh default environment could be installed for a user who
inadvertently destroyed large parts of his or her environment.
Finally, multiple environments would simplify experimentation with
new features without running the risk of destroying the entire
system. Different versions of a single package could exist in
2013-10-28 09:16:36 +01:00
different environments.
For more details on multiple environments, see
\refChap{chap-environments}.
2013-10-28 08:10:20 +01:00
\section{How to accomplish it}
The most important aspect of a Lisp operating system is not that all
the code be written in Lisp, but rather to present a Lisp-like
interface between users and the system and between applications and
the system. It is therefore legitimate to take advantage of some
2013-10-28 10:07:40 +01:00
existing system (probably \linux{} or some \bsd{} version) in order to
provide services such as device drivers, network communication, thread
scheduling, etc.
2013-10-28 08:10:20 +01:00
\subsection{Create a Lisp system to be used as basis}
2013-10-28 10:07:40 +01:00
The first step is to create a \cl{} system that can be used as a basis
for the Lisp operating system. It should already allow for multiple
environments, and it should be available on 64-bit platforms.
Preferably, this system should use as little \clanguage{} code as
possible and interact directly with the system calls of the underlying
kernel.
2013-10-28 08:10:20 +01:00
2013-10-28 09:16:36 +01:00
\subsection{Create a single-user system as a \unix{} process}
2013-10-28 08:10:20 +01:00
2013-10-28 10:07:40 +01:00
In parallel with creating a new \cl{} system, it is possible to
implement and test many of the features of the interface between the
system and the users, such as the object store (probably without
access control) using an existing \cl{} system running as a process in
an ordinary operating system.
2013-10-28 08:10:20 +01:00
2013-10-28 10:07:40 +01:00
The result of this activity would be sufficient to write or adapt
several applications such as text editors, inspectors, debuggers, GUI
2013-10-28 08:10:20 +01:00
interface libraries, etc. for the system.
2013-10-28 10:07:40 +01:00
\subsection{Create a multi-user system as a \unix{} process}
With the new \cl{} system complete and the object store implemented,
it will be possible to create a full multi-user system, including
protection, as a \unix{} process, where the \unix{} system would play
the role of a virtual machine, supplying essential services such as
input/output, networking, etc.
2013-10-28 08:10:20 +01:00
\subsection{Create device drivers}
2013-10-28 09:16:36 +01:00
The final step is to replace the temporary \unix{} kernel with native
2013-10-28 10:07:40 +01:00
device drivers for the new system.