lml -- Lambda Markup Language

A.K.A. "Stupid NET Tricks"

Q: What is this?

;;(?
(load "lmlbase.scm")
;;?)

(!DOCTYPE foo SYSTEM "example.dtd")
                             (!""" this is a comment! """)
;;(?
(define the-document   ; bind it to a name we can use
;;?)

  ((foo attr1 = "a value" attr2 = "another attribute value")
   ((bar)
    "text"
    ((baz)
     "Some more text "
     ((point point_attr = "yadda")))
              (!""" here's a pair """
	                  """ of comments  """)
    "yet another text element"))
;;(?
)
;;?)

A: All of ...

  • An lml document
  • A valid SGML document
  • An R5RS Scheme program
  • An entry in the (imaginary) International Obfuscated SGML Contest
  • An investigation of the question: "Is a markup language a programming language?"

How it works

  1. We define an SGML concrete syntax compatible with the Scheme programming language
  2. The lml document begins with a processing instruction which bootstraps a scheme system with the procedures necessary to process the DTD.
  3. The scheme system will then evaluate the (!DOCTYPE ) as a macro which will load (and execute) the DTD
  4. The execution of the DTD results in a procedure definition for each element type
  5. Each defined procedure, when applied to its arguments (the attribute assignments in a start tag) constructs a new procedure which can process the elements content
  6. The document's root element is then evaluated, and recursively, so are all of the contained elements

lml's Concrete Syntax

NESTC ")"
NET   ")"
PIC   ";;?)"
MDO   "(!"
MDC   ")"
STAGO "(("
ETAGO "(/"
TAGC  ") )"
PIO   ";;(?"
RNI   "_"
OR    "^"
COM   '"""'
SHORTREF NONE
         '"'

Some more rules ...

  • No mixed content -- PCDATA can only occur in one kind of element. SHORTREFS map the double quote to start and end tags for that element type
  • Spaces are required around = in attribute assignments
  • Comments are not allowed in PCDATA
  • Processing instruction's PIO and PIC strings look like comments to the Scheme interpreter.

The Scheme bootstrap program -- lmlbase.scm

;; lmbase.scm
;; The simplest processing of an lml document ... the identity transform
;;   

;;
;; construct a list of name/value pairs from an element's attributes
;;

(define (make-attrs mlist)
  (let loop ((alist mlist)
             (attr-map '()))
    (if (null? alist)
        attr-map
        (loop (cdddr alist)
              (cons (list (car
                           alist) (caddr alist)) attr-map)))))

;;
;; decompose attributes into a sequence of name = "value"
;;

(define (attributes attr-list)
  (if (null? attr-list)
      '()
      (cons  (caar attr-list) 
             (cons '= 
                   (cons (cadar attr-list) (attributes (cdr attr-list)))))))

;;
;; build a procedure for an element type.  The tagname will become
;; a procedure which constructs the element's contents
;;

(define (make-element tag)
  (lambda attrs 
    (let ((attlist (reverse (make-attrs attrs))))
      (lambda x 
        (let loop ((thelist x)           ;; thelist is the element content
                   (nodeview '()))       ;; we store the content in nodeview
          (if (null? thelist)            ;; finished with contents?
              ;; yes, assemble tag with attributes and contents
              (cons (cons tag
                          (attributes attlist)) (reverse nodeview))
              ;; no, add one child element to nodeview and recurse
              (loop (cdr thelist)
                    (cons (car thelist) nodeview))))))))

;;
;; the minimal definitions to allow the DTD to be executed as
;; a program
;;

(define-syntax !ELEMENT
  (syntax-rules ()
    ((!ELEMENT ename EMPTY)
     (define ename
       (make-element 'ename)))
    ((!ELEMENT ename ('_PCDATA))
     (define ename
       (make-element 'ename)))
    ((!ELEMENT ename ( stuff ... ) rep )
     (define ename
       (make-element 'ename)))))

(define-syntax !ATTLIST
  (syntax-rules ()
    ((!ATTLIST tag name type default ... )
     (define name 'name))))

(define-syntax !DOCTYPE
  (syntax-rules ()
    ((!DOCTYPE root-element-name SYSTEM pathname)
     (load pathname))))

;;
;; some more defines just so all symbols in DTD are bound
;;

(define !SHORTREF 
  (lambda args '!SHORTREF))

(define !USEMAP
  (lambda args '!USEMAP))

(define !ENTITY
  (lambda args '!ENTITY))

(define ! 
  (lambda args args))

(define docmap 'docmap)
(define starttext 'starttext)
(define STARTTAG 'starttag)
(define ENDTAG 'endtag)
(define endtext 'endtext)
(define textmap 'textmap)

The SGML Declaration -- lml.dcl

<!SGML -- SGML Declaration for valid LML documents --
     "ISO 8879:1986 (WWW)"

    CHARSET
          BASESET  "ISO Registration Number 177//CHARSET
                    ISO/IEC 10646-1:1993 UCS-4 with
                    implementation level 3//ESC 2/5 2/15 4/6"
         DESCSET 0       9       UNUSED
                 9       2       9
                 11      2       UNUSED
                 13      1       13
                 14      18      UNUSED
                 32      95      32
                 127     1       UNUSED
                 128     32      UNUSED
                 160     65376   160
     CAPACITY NONE

     SCOPE DOCUMENT

     SYNTAX
         SHUNCHAR NONE
         BASESET "ISO Registration Number 176//CHARSET
                 ISO/IEC 10646-1:1993 UCS-4 with implementation 
                 level 3//ESC 2/5 2/15 4/6"
         DESCSET
             0 1114112 0
         FUNCTION
             RE    13
             RS    10
             SPACE 32
             TAB   SEPCHAR 9

         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR ".-_:"    
                  UCNMCHAR ".-_:"
                  NAMECASE GENERAL NO
                           ENTITY  NO

         DELIM
             GENERAL SGMLREF
             HCRO "&#x" -- 38 is the number for ampersand --
             NESTC ")"
             NET ")"
             PIC ";;?)"
             MDO "(!"
             MDC ")"
             STAGO "(("
             ETAGO "(/"
             TAGC  ") )"
             PIO   ";;(?"
             RNI   "_"
             OR    "^"
             COM   '"""'
             SHORTREF NONE
                 '"'

         NAMES
             SGMLREF

         QUANTITY NONE

         ENTITIES
             "amp" 38
             "lt" 60
             "gt" 62
             "quot" 34
             "apos" 39

     FEATURES
         MINIMIZE
             DATATAG NO
             OMITTAG NO
             RANK NO
             SHORTTAG
                 STARTTAG
                     EMPTY NO
                     UNCLOSED NO 
                     NETENABL ALL
                 ENDTAG
                     EMPTY NO 
                     UNCLOSED NO
                 ATTRIB
                     DEFAULT YES
                     OMITNAME NO
                     VALUE NO
             EMPTYNRM YES
             IMPLYDEF
                 ATTLIST NO
                 DOCTYPE NO
                 ELEMENT NO
                 ENTITY NO
                 NOTATION NO
         LINK
             SIMPLE NO
             IMPLICIT NO
             EXPLICIT NO
         OTHER
             CONCUR NO
             SUBDOC NO
             FORMAL NO
             URN NO
             KEEPRSRE YES
             VALIDITY TYPE
             ENTITIES
                 REF ANY
                 INTEGRAL NO
     APPINFO NONE

>

The DTD for the example document -- example.dtd

(!ELEMENT foo ( bar )*)
(!ELEMENT bar ( baz ^ text )*)
(!ELEMENT baz ( point ^ foo ^ bar ^ baz ^ text )*)
(!ELEMENT point  EMPTY )
(!ELEMENT text (_PCDATA))

(!ATTLIST foo
        attr1 CDATA "blah")
(!ATTLIST foo
        attr2 CDATA _IMPLIED)

(!ATTLIST point
        point_attr CDATA _REQUIRED)

(!SHORTREF docmap '"' starttext)
(!SHORTREF textmap '"' endtext)

(!ENTITY starttext STARTTAG "text")
(!ENTITY endtext ENDTAG "text")

(!USEMAP docmap foo)
(!USEMAP textmap text)

Comments?

William D. Lindsey