Lexer rule that allows identifiers to begin with digits #4854

The-Futurist · 2025-06-21T18:49:46Z

The-Futurist
Jun 21, 2025

I'm considering updating my grammar for a new language and want to support identifiers that begin with one or more digits. This seems to work fine but am I missing something? is there some edge case I'm not considering? (tests seem to suggest this works fine)

IDENTIFIER: [0-9]* [a-zA-Z_] [a-zA-Z0-9_]*;

prior this tweak the rule was simply:

IDENTIFIER: [a-zA-Z_] [a-zA-Z0-9_]*;

In fact I'd be interested to be able to class the following as identifiers too:

123.5nF
0.1KHz

Bearing in mind the language uses DOT as a separator in structure member references and namespaces, but that should all still work because they are parse rules and once a identifier is recognized the fact it might have a DOT in it (as shown above) is immaterial.

I do currently recognize this kind of thing as a DECIMAL token:

123.456
0.11055

ericvergnaud · 2025-06-22T09:45:11Z

ericvergnaud
Jun 22, 2025
Maintainer

If your grammar also supports non-decimal numbers, you might have a problem:
is 0x22 an integer or an identifier ?

1 reply

The-Futurist Jun 22, 2025
Author

0x22 would match an identifier (not one that I'd choose of course!).

The language has a distinct notation for numeric literals:

HEX: 0F3C004D:h and 03D1 47A2:H are two examples of legal hex literals (and a similar thing goes for Octal).

I dropped the idea of 123.5nf and so on, while possible (I got it to work) it is undesirable really. But identifiers that can optionally begin with digits and that seems reasonable, if someone wanted to use it, for example this is some recent C# code from an nRF24L01+ transceiver library:

    public enum DataRate : byte
    {
        Min = 2, // 250 kbps
        Med = 0, // 1Mbps
        Max = 1  // 2Mbps
    }

that could now be represented (in this different language) as:

    enum DataRate byte
        250kbps = 2, 
        1000kbps = 0, 
        2000kbps = 1  // 2Mbps
    end

so it's primarily aimed at enums and constants (and perhaps could be grammatically restricted to just these use cases).

I suppose too that if one were to confine this notation to literals one could allow .:

freq = 123.5MHz

resistance = 10.5Mohm

Here the language would regard these tokens as say USER_LITERAL and allow the user to declare these as some new type literal:

dcl 123.5MHz literal (123.5)

where the token is just a name and is a replacement for some other conventional literal, and that USER_LITERAL token can only be used in the declaration of a literal as shown above (i.e. its not a variable, cannot be assigned to etc) here it is a float constant in reality but has that unusual name.

I'm just wondering if there's some unexpected behavior, some thing that would match that I'm not anticipating.

The-Futurist · 2025-06-22T17:33:15Z

The-Futurist
Jun 22, 2025
Author

OK I think I have solved my problem now. Rather than messing with identifiers, I just added a new kind of literal (based on a new token LITERAL) and added that to the list of allowed literal types within expressions.

IDENTIFIER:  [a-zA-Z_] [a-zA-Z0-9_]*;
LITERAL: (DECIMAL_PATTERN)? IDENTIFIER;

with

primitiveExpression
  : numericLiteral
  | stringLiteral
  | literal
  | reference
  ;

Then the language itself will define a way to define such literals with actual literal-decimal values and it's that value that will be seen/used by the compilation process itself.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lexer rule that allows identifiers to begin with digits #4854

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Lexer rule that allows identifiers to begin with digits #4854

Uh oh!

Uh oh!

The-Futurist Jun 21, 2025

Replies: 2 comments · 1 reply

Uh oh!

ericvergnaud Jun 22, 2025 Maintainer

Uh oh!

Uh oh!

The-Futurist Jun 22, 2025 Author

Uh oh!

Uh oh!

The-Futurist Jun 22, 2025 Author

The-Futurist
Jun 21, 2025

Replies: 2 comments 1 reply

ericvergnaud
Jun 22, 2025
Maintainer

The-Futurist Jun 22, 2025
Author

The-Futurist
Jun 22, 2025
Author