Skip to content

Screen.draw() silently drops text due to handling of SOH/STX #190

@moyix

Description

@moyix

In bash apparently \[ and \] (common in shell prompts) get turned into SOH (0x01) and STX (0x02) respectively. However, I believe they are not being handled correctly by the parser in Stream, which results in sequences like \x02text being passed to Screen.draw(), which bails on the first control character it encounters (and therefore skips over text).

import pyte
def box(lines):
    w = len(lines[0])
    tb = f"+{'-'*w}+"
    return '\n'.join([tb]+[f'|{l}|' for l in lines]+[tb])
screen = pyte.Screen(10, 3)
stream = pyte.Stream(screen)
stream.feed('hello\r\n\x01dropped\x02dropped\r\n')
print(box(screen.display))
# Prints:
# +----------+
# |hello     |
# |          |
# |          |
# +----------+

The DebugStream gives:

["draw", ["hello"], {}]
["carriage_return", [], {}]
["linefeed", [], {}]
["draw", ["\u0001dropped\u0002dropped"], {}]
["carriage_return", [], {}]
["linefeed", [], {}]

The root cause seems to be the following regex to match "plain text":

pyte/pyte/streams.py

Lines 134 to 140 in 636b679

#: A regular expression pattern matching everything what can be
#: considered plain text.
_special = set([ctrl.ESC, ctrl.CSI_C1, ctrl.NUL, ctrl.DEL, ctrl.OSC_C1])
_special.update(basic)
_text_pattern = re.compile(
"[^" + "".join(map(re.escape, _special)) + "]+")
del _special

Which is then used to find chunks of plain text that can be passed to draw():

pyte/pyte/streams.py

Lines 190 to 206 in 636b679

draw = self.listener.draw
match_text = self._text_pattern.match
taking_plain_text = self._taking_plain_text
length = len(data)
offset = 0
while offset < length:
if taking_plain_text:
match = match_text(data, offset)
if match:
start, offset = match.span()
draw(data[start:offset])
else:
taking_plain_text = False
else:
taking_plain_text = send(data[offset:offset + 1])
offset += 1

So a minimal solution (which works for me) would just be to add SOH and STX to the _special set so that they aren't treated as plain text. But maybe all of the control characters (0x00-0x1F) should be excluded as well?

Here is a quick patch. Happy to make a PR: master...moyix:pyte:moyix/fix_ctrl_chars

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions