Jump to content

Sign in to follow this  
GefrManny

SQF/SQS Backus-Naur form syntax?

Recommended Posts

(Prologue: If you don't know what BNF is about, then you might want to skip this topic since it probably won't matter to you.)

This question is perhaps a little ackward, but I wonder if Bohemia has documented the SQF/SQS syntax in BNF somewhere, or if someone came up with a legitimate notation.

Probably it isn't something of interest for scripters but I think it is something important if anyone came up with the idea to write an IDE for ArmA editing.

Regards,

Manny

Share this post


Link to post
Share on other sites

I dont think you will find anything but the Biki.

Share this post


Link to post
Share on other sites

Hi

Thumbs up for this idea!

I happen to know that some people over at www.ofpec.com were working on the idea of writing a syntax-checker program for OFP scripts. That is quite the same as what you have in mind. I have no knowledge how far that project realized in practice. I will send you a private message in which I tell who you should contact if you want to know more about this project. The person has not been active for some time now but it is worth a shot.

Best Regards,

Baddo.

Share this post


Link to post
Share on other sites

I somehow doubt that we'll get a BNF documentation of the scripting language from some official side, whether they have it or not. But I think it would be good to write it down once, although I don't think the syntax of the scripting language would be very complicated to formalize.

Share this post


Link to post
Share on other sites

It's pretty easy. I've done it already. Not in BNF form simply because doing tokenized lexical parsing (via a State Machine)and then using an abstract syntax tree for semantical parsing works far better. It helps if you want semantical parsing on a seperate thread.

...and no... I won't share my method.

oh crap... did I just leak something?

Share this post


Link to post
Share on other sites

Oh nice . . . sounds like someone is coming up with a decent scripting editor biggrin_o.gifwhistle.gif

Share this post


Link to post
Share on other sites
  (CrashDome @ April 10 2007,21:15) said:
It's pretty easy. I've done it already. Not in BNF form simply because doing tokenized lexical parsing (via a State Machine)and then using an abstract syntax tree for semantical parsing works far better. It helps if you want semantical parsing on a seperate thread.

In theory, LL(n) parsers or Deterministic Finite State Machines for that matter *should* do the job. That is, once you have the time to figure out if the grammar is actually correct and exactly the same as ArmA's. Practically, I fear that there are trade-off decisions to think about concerning the implementation, and several points of big, nasty lookaheads.

Share this post


Link to post
Share on other sites
  (GefrManny @ April 10 2007,15:17) said:
  (CrashDome @ April 10 2007,21:15) said:
It's pretty easy. I've done it already. Not in BNF form simply because doing tokenized lexical parsing (via a State Machine)and then using an abstract syntax tree for semantical parsing works far better. It helps if you want semantical parsing on a seperate thread.

In theory, LL(n) parsers or Deterministic Finite State Machines for that matter *should* do the job. That is, once you have the time to figure out if the grammar is actually correct and exactly the same as ArmA's. Practically, I fear that there are trade-off decisions to think about concerning the implementation, and several points of big, nasty lookaheads.

LL(n) is plenty well enough for config files. It's a bit trickier concerning SQF, but SQS is very simple. It only becomes really difficult if you want to basically include some sort of a pre-compilation expression check. For example, most errors tend to be typos or incorrect use of brackets or such and therefore easy to check for since they are rudimentary syntax rules. A simple BNF parser would be fine for that even.  If you want to check against expressions then it gets far too difficult since variables are not identified by type (i.e. variable X can be a number, object, etc.. at any time).

Although... I must admit parsing is completely over my head in most contexts and I tend to leave that to guys who have been writing parsers for a long time for  many many languages and I just use what code they give me as best I can.

I have some grammer definitions which are complete and I've had little to no problems. I am writing a parser for the config files and it is basic. My only concerns were speed and the ability to modify the tokens. I had to trade off a seperate thread for direct access to tokens for the config files unfortunately. However, SQS and SQF could be parsed in a seperate thread using an AST for speed at the expense of only showing errors and not modifying anything directly.

Plenty for these languages.

[EDIT] ...and no need for nasty lookaheads....

Share this post


Link to post
Share on other sites

It is indeed very tricky for SQF, I can confirm that by.. errr... let's call it "experience" (parse errors in "correct" code or vice versa has a negative connotation wink_o.gif). But nasty lookaheads >= 2 are pretty much common at certain decision making points within a parser's productions. Especially in SQF (think of a statement (<semicolon> statement )* [<semicolon>] <eof> production - which is valid in SQF); either that, or I suck. It's probably a compromise of both.

It get's worse though: the only way I could actually let the lexer check for function calls and their arguments without hardcoding all 100+ functions themselves or running into leftsided recursions is by keeping all function related data externalized in a (XML, CSV, TXT, ...) file and make pretty ackward looking checks, mostly at points where the parser identifies an IDENT-token (which is by my definition [a-zA-Z_][a-zA-Z0-9_]*) and checks if there should be any arguments left or right of it.

I, by the way, do actually try to verify expression types wherever sane and possible as an added fun bonus. However, it really has its limits.

Also, I have to agree with you in one particular statement: I must admit parsing is completely over my head in most contexts. You do seem fairly competent to me, so don't degrade yourself wink_o.gif I, for one, would welcome a little joint effort.

Share this post


Link to post
Share on other sites

You could contact the person I gave you info about if you didn't do it already. You could all team up so we could see something usable someday. As far as I know there was certainly quite a lot of thinking done in that project (even a prototype of some sort was produced, as a proof-of-concept) and it would not be good if it is all left unused.

Share this post


Link to post
Share on other sites

Most of what you have mentioned is why I prefer the broken out lexical parsing from the semantics. In one pass I can read a pre-made document and get all the tokens.

I DO have an XML variation of the command list which is actually parsed from the wiki via regex (parsed once, then XML built for editing/tweaking and inclusion in my project). This requires me to reparse the wiki with any new commands or manually add/change them. Either way, it is the only way to get a good structure for commands.

Since I have the commands, I have a command token I can assign to them. Global and Private Identifiers are also individual tokens allong with pretty much everything else. It wasn't easy with SQF, but I did that first and probably only took me three days to get it 99% done.

I haven't done the SQF semantics yet, but since I have commands in XML complete with parameters, I can build a very simple system of checking for small coding errors (i.e. missing brackets after "then"). But, as you suggested, the structure of SQF is fully flexible. Most scripting languages are. The problem is, it makes checking for code errors very difficult. For example:

<table border="0" align="center" width="95%" cellpadding="0" cellspacing="0"><tr><td>Code Sample </td></tr><tr><td id="CODE">this doMove getpos Target

is completely valid.. doMove accepts a parameter after and getpos does not require a front parameter... but the result is based on type (position array)

<table border="0" align="center" width="95%" cellpadding="0" cellspacing="0"><tr><td>Code Sample </td></tr><tr><td id="CODE">this doMove getDir Target

is completely INVALID. Again, we only get that info by checking the return result against a required paramter... but which command gets priority???

it gets very complicated... it is certainly do-able... but is not anywhere near being easy.

SQM, Config, EXT, etc...

They are so easy, it should be used as a tutorial:

  [b said:
Quote[/b] ]

class {ClassName} : {Optional Inheritance}

{

propertyA = "string";

propertyB = 1.000000;

propertyC[] = {"array","array"...}

class {SubClassName} : {xxxxx}

{

};

};

this is off the top of my head, but a Top-Down LL(n) recursive parse is perfect. The rules are simple and short. One keyword "class" and everything else in code block form.

The only "optional" rules are Inheritance and Arraylists (check for ":" and "[]" respectively).

I also added support for #define and #include.

So incredibly easy....

I wrapped my tokens into custom object tree. I can do anything with them.

Share this post


Link to post
Share on other sites

Has anyone yet made a context-free BNF for SQF ?

Im working on it to convert SQF into Python Scripts for some idea I have, so far its quite good, im using pythin ply, but not finished. but if anyone has a BNF for SQF, please let me know, i would like to share.

Share this post


Link to post
Share on other sites

There are some checkers out there but is there any open source checker or some document describing the BNF of Arma floating around? It would be nice if we could start a community project, instead of having many people re-inventing the wheel every odd year or so.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×