Bugfix for scrubber sample code which fails when scrubbing "two"

The code in section "Scrubber" of https://derwen.ai/docs/ptr/sample/ has a small bug: When you add a token that also exists as a single term in the file, like "two", the `while` loop will consume the whole span and `span[0]` will then fail. Easy fix:

In (using my tokens instead of the ones on the page):

```
def prefix_scrubber():
    def scrubber_func(span: Span) -> str:
        while span[0].text in ("every", "other", "the", "two"): # ATTN, different tokens, will fail in original code
            span = span[1:]
        return span.text
    return scrubber_func
```
just add `len(span) > 1 and` and replace

```        while span[0].text in ("every", "other", "the", "two"):```
by
```        while len(span) > 1 and span[0].text in ("every", "other", "the", "two"):```

to get

```
def prefix_scrubber():
    def scrubber_func(span: Span) -> str:
        while len(span) > 1 and span[0].text in ("every", "other", "the", "two"):
            span = span[1:]
        return span.text
    return scrubber_func
```

Now, for the sample used on that page, I get

```
0.13134098, 05, sentences, [sentences, the two sentences, sentences, two sentences, the sentences]
0.07117996, 02, sentence, [every sentence, every other sentence]
```

and the line for "two" is still fine

```
0.00000000, 02, two, [two, two]
```

You are welcome to use the token list I used, ("every", "other", "the", "two"), it gives even more merged results than the example on the page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bugfix for scrubber sample code which fails when scrubbing "two" #232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Bugfix for scrubber sample code which fails when scrubbing "two" #232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions