4. LingDoc and Grammar Formalisms: Position Paper
In this paper we position the grammar formalisms of Transform and Clarity.
Formalisms for Document and Language Engineering range from general purpose programming languages to specific application oriented languages. Language designers want to maximize the applicability of their formalism; problem solvers want to specify their problem in the shortest way possible, with implicit assumptions about the domain of their problem.
For a new problem domain the natural tendency is that problem solvers start with a general purpose programming language and that in the course of time more specific languages are developed. Computational Linguistics has not been an exception to this tendency. As more linguistic theories emerged more formalisms were invented to suit their theories. Sometimes, in that process computer scientists tend to enforce their ideas about formalisms on linguists. Also, linguists may have wrong ideas about the complexity of computer programs and for that reason take inappropriate decisions about the use of formalisms. They are not aware of internal optimizations which influence the complexity.
In general, the following requirements for formalisms for Document and Language Engineering may be stated, based upon our experience and observation.
Jurafsky and Martin (2000, 2008) discuss a large number of syntactic formalisms for Computational Linguistics. Some of them have been used only experimentally, others are in use in practical natural language systems. We are interested in the latter category and in the question how we can build effective systems.
Current formalisms, among others, are the following.
There are some commercial packages available for the deployment of one or more of the abovementioned formalisms. Some try to introduce a graphical representation of the syntax.
Most implementations are provided by universities, as an offspring of dedicated projects.
Document Engineering has been facilitated by a number of standards, notably those of the W3C. Syntactic formalisms have been developed for, among others, document structure description (like DTD and schema languages), document transformation (like XSLT) and document querying (like XQUERY).
Some problems manifest themselves gradually during the course of a project. One should pay attention beforehand in order to avoid them.
Both Transform and Clarity try to reach the requirements by the unification of frequently used formalisms in Computational Linguistics and Computer Science. Starting with the latter, extensions which were necessary for the solution of practical problems in Language and Document Engineering were gradually added. Document structure, linguistic structure and subsequent transformations can be expressed by one common formalism.
The emphasis during the development of Transform was to exploit and extend techniques from Computer Science in order to reach efficient solutions for a large number of applications. The property of on-line processing has constantly been maintained.
The formalisms of Transform and
The unification is specified explicitly by comparison of variables.
The grammatical formalism is declarative, but within grammar rules programming statements may be inserted. They communicate with the grammar rule through variables.
Transform only. The output of one cascaded grammar is the input for the next one. The grammars are connected by pipes.
Not in
The code is interpreted by a formal automaton.
Transform only. The syntax of the formalism is described by a metagrammar which is adaptable and which is written in the same formalism. Therefore, the user may adjust the syntax to become more terse or verbose or to disallow some sub-formalisms. Transform comes with two metagrammars. The “classic” one was used until 2003 (also used in the thesis). The “modern” one follows the syntax for the W3C recommendations.
The following tables give a comprehensive
comparison of Transform,
|
strong equi valence |
cfg |
Chom sky
type-0 grammar |
regular expres sions on char level |
regular expres sions on grammar rule level |
characters
as tokens |
RTN |
|
|
|
|
x |
|
ATN |
|
|
|
|
x |
|
Attribute |
|
|
|
|
x |
|
Affix |
|
|
|
|
x |
|
APSG |
|
|
|
|
x |
|
APSG
+unification |
|
|
|
|
x |
|
DCG
(in Prolog) |
|
x |
|
|
|
|
PCFG |
|
x |
|
|
x |
|
HPSG |
|
x |
|
|
|
|
TAG |
|
|
|
|
|
|
LFG |
|
|
|
|
|
|
Syntax directed translation schemata |
|
x |
|
|
|
|
PMR |
|
|
|
x |
|
|
Transform |
x |
|
x |
x |
x |
x |
|
|
|
|
|
x |
|
|
lexe mes as tokens |
wild cards |
varia bles |
unifi cation of varia bles |
bool eans |
cor rec tion /
trans lation |
RTN |
x |
|
|
|
|
|
ATN |
x |
|
x |
|
|
|
Attribute |
x |
|
x |
|
|
|
Affix |
x |
|
x |
|
|
|
APSG |
x |
|
x |
|
|
|
APSG
+unification |
x |
|
x |
im plicit |
|
|
DCG
(in Prolog) |
x |
|
x |
im plicit |
|
|
PCFG |
|
|
|
|
|
|
HPSG |
|
|
|
|
|
|
TAG |
|
|
|
|
|
|
LFG |
|
|
|
|
|
|
Syntax directed translation schemata |
|
|
|
|
|
x |
PMR |
|
x |
x |
|
x |
|
Transform |
x |
x |
x |
ex plicit on line |
x |
|
|
x |
|
x |
ex plicit off line |
|
x |
|
pro grams attached |
output instruc tions |
proba bilies |
weak equi valent
nota tions |
left
recur sion |
ambi gui ties allo wed |
error reco very |
RTN |
|
|
|
|
x |
? |
|
ATN |
|
|
|
|
x |
? |
|
Attribute |
|
|
|
|
|
|
|
Affix |
|
|
|
|
|
x |
|
APSG |
|
|
|
|
|
x |
|
APSG
+unification |
|
|
|
|
|
x |
|
DCG
(in Prolog) |
|
|
|
|
|
x |
|
PCFG |
|
|
x |
|
|
x |
|
HPSG |
|
|
|
|
|
x |
|
TAG |
|
|
|
|
|
x |
|
LFG |
|
|
|
|
|
x |
|
Syntax directed translation schemata |
|
|
|
|
|
|
|
PMR |
|
x |
|
|
|
|
|
Transform |
x |
x |
|
x |
x |
x |
x |
|
x |
|
|
|
x |
x |
|
Some wishes for LingDoc Transform are: