How to handle large data in DrScheme

In the PLT discussion list thread titled “How to handle large data”, Kenichi wanted to load a very large file into DrScheme, 120,000 lines long, and it was hanging. Noel explained that because of debugging and performance instrumentation, DrScheme adds a lot of overhead, and that MzScheme could load the file just fine. Eli explained that DrScheme’s overhead could be made much smaller by making the contents of that file one big piece of quoted data, and that the overhead could be completely eliminated by putting the data in a file and then loading it.
The following code is a generalization of the two approaches that he described there, which were tailored to Japanese postal data.

(define info-list-data
  '(("Alpha" "Bravo" 1)
    ("Delta" "Echo" 2)))
(define-struct info (fname lname budget))
(define (info-data->info entry)
  (make-info (first entry) (second entry) (third entry)))
(define info-list
  (map info-data->info info-list-data))
(define info-list-from-file
  (map info-data->info
       (call-with-input-file "data-overhead-elimination.data" read)))
(display "Total budget: ")
(display (apply + (map (λ (info) (info-budget info)) info-list-from-file)))
(newline)

All Scheme Search

All Scheme Search is a Google based search engine for “Everything about the Scheme Programming Language”.
If seven pages of search results for “keyword arguments” is any measure, this looks pretty interesting :).
Addendum: 01/05/08
Here are two more: Scheme from the Florida Keys (actively updated) and Search PLT Scheme (not actively updated).

Automata via Macros

Last year I asked a question about Automata via Macros to the PLT Discussion List. The paper is about an extended example of a problem posed by Shriram Krishnamurthi at the Lightweight Languages 1 conference. I was trying to understand how one might use the same FSA language to generate different outputs. Many folks explained how that would work and the thread finished. It didn’t finish for me, though. To say my understanding of the solution was superficial would be an overstatement. I knew I didn’t “get it”, and kept it on my list to revisit once I had a better grasp on things. That was a year and a half ago.
Over my holiday I revisited the paper and found, for lack of a better way to say it, that it “made sense”. In the past year I’ve tried to: read a lot, write a lot of code, read other people’s code, and most importantly do so with some discipline and attention to detail that I had not applied before. That seems to have paid off; as it felt like I was reading the paper with brand new eyes.
This time around, I also listened to a recording of the presentation itself and watched the PowerPoint that went along with it. This was especially valuable because it drew attention to various tidbits that appeared in the paper but were much easier to delineate by virtue of Shriram pointing them out in context of the presentation. It was those comments, along with what I had learned in the past year that really got me thinking about things again.
In a conversation with Aaron Hsu, he once said something to the effect that “Scheme is hard because it is subtle.”, and that sort of rang true with my intuition, but I had a difficult time verbalizing exactly why I thought that to be true. Outwardly Scheme appears to be “simple”, but the more you learn about it and the deeper you dig, you start to see that it is Deceptively Simple.
Scheme seems to take only a few days to learn, but a few years to master; but why? I don’t know yet, but what I do know is that some of the ideas and topics presented in this paper and presentation are surely some of the aforementioned “subtleties” that serve as landmarks on this multi-year path towards understanding Scheme. For my own reference, and for others who may be studying Scheme and wondering exactly why “Subtlety is hard”, perhaps this paper will serve to shed some light on things.
My advice: read LAMBDA: The Ultimate Imperative and LAMBDA: The Ultimate GOTO first.

Abstract

Lisp programmers have long used macros to extend their language. Indeed, their success has inspired macro notations for a variety of other languages, such as C and Java. There is, however, a paucity of effective pedagogic examples of macro use. This paper presents a short, non-trivial example that implements a construct not already found in mainstream languages. Furthermore, it motivates the need for tail-calls, as opposed to mere tail-recursion, and illustrates how support for tail-call optimization is crucial to support a natural style of macro-based language extension.

Key notes from the audio

These are mostly my notes and summaries for some key points that rang true with me. The ideas that they represent may or may not be contained within my summary as it would be too far outside the scope of this blog post to address them. Such work is better done by other papers and presentations.

  • 09:00: Without the hidden gems, syntax, macros, and tail calls, it all falls apart.
  • 11:16: What you generate must fit well into the language.
  • 18:00: It is really tail-transfer (tail-calls) that matter, not tail-recursion.
  • 21:00: Scheme is really an indentation based language.
  • 23:00: Macro languages belong in modules; you can use your data (DSL) where you wish.
  • 25:30: If you claim to be smart, leverage the work of others.
  • 26:10: Scheme is subtle. Takes years to understand what is beautiful. You can look at a book but you won’t notice. The language hides it jewels.
  • 26:45 If you don’t get tail calls, you don’t get it.

Key notes from the paper

  • P2P3: “A macro gives the programmer a consistent, representation-independent means of describing the datum
    while still resolving the representation before compilation.”
  • P7P4: Macros are likely to be used to optimize the abstraction that the data represents.
  • P8P3: Scheme’s macro system could be called “a lightweight compiler API”.
  • P9P2: Tail-calls ought to be called tail-transfer per Steele. I wonder why they didn’t push that definition more.
  • P9P3: Here is a good explanation of tail calls.
  • P9P5: “languages represent the ultimate form of reuse, because we get to reuse everything from the mathematical (semantics) to the practical (libraries), as well as decades of research and toil in compiler construction (Hudak, 1998).”
  • P13P2: “it shows how features that would otherwise seem orthogonal, such as macros and tail-calls, are in fact intimately wedded together; in particular, the absence of the latter would greatly complicate use of the former.”

The automaton language

; An automaton that recognizes the language c(ad)*r
(define m
  (automaton init
             (init :
                   (c → more))
             (more :
                   (a → more)
                   (d → more)
                   (r → end))
             (end : accept)))

Dynamically avoiding duplicate identifiers in PLT Scheme

In this thread on the PLT discussion list, the original poster was encountering a problem while implementing a DSL where definitions were getting defined more than once in the code that he was generating. The problem is that the define function will not define the same name twice:

(define x 10)
(define x 12)
=> duplicate definition for identifier in: x

The solution would be to check if the given name is already bound before defining: if it was not defined, the define function should be used, otherwise the set! function should be used:

(define x 10)
(set! x 12)

Here is Andre’s solution from the thread:

(define-syntax define-if-not
  (λ (stx)
    (syntax-case stx ()
      [(_ var val)
       (with-syntax ([set!define (if (identifier-binding #'var 0)
                                      #'set!
                                      #'define)])
         (syntax
          (set!define var val)))])))

identifier-binding returns true unless the name is a top level binding or is not bound at all. If the name is already bound, use set!; otherwise use define.