FTP frontend should support Unicode filenames encoded as UTF-8 #682

Open
opened 2009-04-17 13:32:35 +00:00 by arthur · 13 comments
arthur commented 2009-04-17 13:32:35 +00:00
Owner

using ncftp on a put of a file with an é accent I get the following message :

[action not taken: internal server error]Requested

in the logs server side :

2009-04-17 15:22:07+0200 [ProtocolWrapper,3,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 362, in doRead
            return self.protocol.dataReceived(data)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/policies.py", line 72, in dataReceived
            self.wrappedProtocol.dataReceived(data)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived
            why = self.lineReceived(line)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 698, in lineReceived
            d = defer.maybeDeferred(self.processCommand, cmd, *args)
        --- <exception caught here> ---
          File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 106, in maybeDeferred
            result = f(*args, **kw)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 729, in processCommand
            return method(*params)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 1079, in ftp_STOR
            d = self.shell.openForWriting(newsegs)
          File "/usr/lib/python2.5/site-packages/allmydata/frontends/ftpd.py", line 255, in openForWriting
            path = [unicode(p) for p in path]
        exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 21: ordinal not in range(128)
using ncftp on a put of a file with an é accent I get the following message : [action not taken: internal server error]Requested in the logs server side : ``` 2009-04-17 15:22:07+0200 [ProtocolWrapper,3,127.0.0.1] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 362, in doRead return self.protocol.dataReceived(data) File "/usr/lib/python2.5/site-packages/twisted/protocols/policies.py", line 72, in dataReceived self.wrappedProtocol.dataReceived(data) File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived why = self.lineReceived(line) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 698, in lineReceived d = defer.maybeDeferred(self.processCommand, cmd, *args) --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 106, in maybeDeferred result = f(*args, **kw) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 729, in processCommand return method(*params) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 1079, in ftp_STOR d = self.shell.openForWriting(newsegs) File "/usr/lib/python2.5/site-packages/allmydata/frontends/ftpd.py", line 255, in openForWriting path = [unicode(p) for p in path] exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 21: ordinal not in range(128) ```
tahoe-lafs added the
unknown
major
defect
1.3.0
labels 2009-04-17 13:32:35 +00:00
tahoe-lafs added this to the undecided milestone 2009-04-17 13:32:35 +00:00
francois commented 2009-04-23 10:05:58 +00:00
Author
Owner

This is definitely the same sort of encoding issues as in #534. I'll try to have a look at it.

This is definitely the same sort of encoding issues as in #534. I'll try to have a look at it.
tahoe-lafs modified the milestone from undecided to 1.5.0 2009-04-23 10:05:58 +00:00
tahoe-lafs modified the milestone from 1.5.0 to eventually 2009-06-30 17:16:35 +00:00
warner commented 2009-07-11 11:28:04 +00:00
Author
Owner

reformatted description slightly

reformatted description slightly
tahoe-lafs added
code-frontend
and removed
unknown
labels 2009-07-11 11:28:04 +00:00
davidsarah commented 2010-01-15 02:41:01 +00:00
Author
Owner

See RFC 2640 for FTP internationalization.

See [RFC 2640](http://tools.ietf.org/html/rfc2640) for FTP internationalization.
davidsarah commented 2010-06-15 23:41:39 +00:00
Author
Owner

Replying to davidsarah:

See RFC 2640 for FTP internationalization.

Summary:

  • include UTF8 in the response to a FEAT request;
  • use UTF-8;
  • reject filenames that are not valid UTF-8.

Admirably simple :-)

(See also #1076 about normalization, but that will probably be done in the dirnode interface rather than in frontends.)

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/682#issuecomment-112454): > See [RFC 2640](http://tools.ietf.org/html/rfc2640) for FTP internationalization. Summary: * include `UTF8` in the response to a `FEAT` request; * use UTF-8; * reject filenames that are not valid UTF-8. Admirably simple :-) (See also #1076 about normalization, but that will probably be done in the dirnode interface rather than in frontends.)
davidsarah commented 2010-06-15 23:50:30 +00:00
Author
Owner

Hmm, judging by the exception message ("'ascii' codec can't decode byte 0xe0"), ncftp was trying to use ISO-Latin-1 rather than UTF-8. But at least it would be possible for clients to do the right thing, so I still think we should implement RFC 2640.

Hmm, judging by the exception message ("'ascii' codec can't decode byte 0xe0"), ncftp was trying to use ISO-Latin-1 rather than UTF-8. But at least it would be *possible* for clients to do the right thing, so I still think we should implement RFC 2640.
davidsarah commented 2010-06-15 23:57:51 +00:00
Author
Owner

Actually 'é' is 0xE9 in ISO-Latin-1, so I don't know what encoding this was (but not UTF-8).

Actually 'é' is 0xE9 in ISO-Latin-1, so I don't know what encoding this was (but not UTF-8).
tahoe-lafs changed title from FTP frontend refuses accents to FTP frontend should support Unicode filenames 2010-06-16 00:04:51 +00:00
zooko commented 2010-06-16 16:45:50 +00:00
Author
Owner

With the new improved pyutil-1.7.9 you get this handy-dandy script called "try_decoding":

HACL:~/playground/pyutil/bothw$ python -c 'open("d","wb").write(chr(0xe0))'
HACL:~/playground/pyutil/bothw$ try_decoding d -t  é
HACL:~/playground/pyutil/bothw$ 

Oh hey there are no encodings known to Python 2.6.1 which would decode 0xe0 to é!

Here are all the things that all the encodings would decode 0xe0 to:

HACL Zooko-Ofsimplegeos-MacBook-Pro:~/playground/pyutil/bothw$ try_decoding d
            charmap : à
              cp037 : \
             cp1006 : ﻓ
             cp1026 : ü
             cp1140 : \
             cp1250 : ŕ
             cp1251 : а
             cp1252 : à
             cp1253 : ΰ
             cp1254 : à
             cp1255 : א
             cp1256 : à
             cp1257 : ą
             cp1258 : à
              cp424 : \
              cp437 : α
              cp500 : \
              cp737 : ω
              cp775 : Ó
              cp850 : Ó
              cp852 : Ó
              cp855 : Я
              cp857 : Ó
              cp860 : α
              cp861 : α
              cp862 : α
              cp863 : α
              cp864 : ـ
              cp865 : α
              cp866 : р
              cp869 : ζ
              cp874 : เ
              cp875 : \
          hp_roman8 : Á
          iso8859_1 : à
         iso8859_10 : ā
         iso8859_11 : เ
         iso8859_13 : ą
         iso8859_14 : à
         iso8859_15 : à
         iso8859_16 : à
          iso8859_2 : ŕ
          iso8859_3 : à
          iso8859_4 : ā
          iso8859_5 : р
          iso8859_6 : ـ
          iso8859_7 : ΰ
          iso8859_8 : א
          iso8859_9 : à
             koi8_r : Ю
             koi8_u : Ю
            latin_1 : à
         mac_arabic : ـ
       mac_centeuro : ŗ
       mac_croatian : –
       mac_cyrillic : а
          mac_farsi : ـ
          mac_greek : ύ
        mac_iceland : ý
         mac_latin2 : ŗ
          mac_roman : ‡
       mac_romanian : ‡
        mac_turkish : ‡
             palmos : à
            ptcp154 : а
 raw_unicode_escape : à
             rot_13 : à
            tis_620 : เ
     unicode_escape : à
With the new improved pyutil-1.7.9 you get this handy-dandy script called "try_decoding": ``` HACL:~/playground/pyutil/bothw$ python -c 'open("d","wb").write(chr(0xe0))' HACL:~/playground/pyutil/bothw$ try_decoding d -t é HACL:~/playground/pyutil/bothw$ ``` Oh hey there are no encodings known to Python 2.6.1 which would decode 0xe0 to é! Here are all the things that all the encodings would decode 0xe0 to: ``` HACL Zooko-Ofsimplegeos-MacBook-Pro:~/playground/pyutil/bothw$ try_decoding d charmap : à cp037 : \ cp1006 : ﻓ cp1026 : ü cp1140 : \ cp1250 : ŕ cp1251 : а cp1252 : à cp1253 : ΰ cp1254 : à cp1255 : א cp1256 : à cp1257 : ą cp1258 : à cp424 : \ cp437 : α cp500 : \ cp737 : ω cp775 : Ó cp850 : Ó cp852 : Ó cp855 : Я cp857 : Ó cp860 : α cp861 : α cp862 : α cp863 : α cp864 : ـ cp865 : α cp866 : р cp869 : ζ cp874 : เ cp875 : \ hp_roman8 : Á iso8859_1 : à iso8859_10 : ā iso8859_11 : เ iso8859_13 : ą iso8859_14 : à iso8859_15 : à iso8859_16 : à iso8859_2 : ŕ iso8859_3 : à iso8859_4 : ā iso8859_5 : р iso8859_6 : ـ iso8859_7 : ΰ iso8859_8 : א iso8859_9 : à koi8_r : Ю koi8_u : Ю latin_1 : à mac_arabic : ـ mac_centeuro : ŗ mac_croatian : – mac_cyrillic : а mac_farsi : ـ mac_greek : ύ mac_iceland : ý mac_latin2 : ŗ mac_roman : ‡ mac_romanian : ‡ mac_turkish : ‡ palmos : à ptcp154 : а raw_unicode_escape : à rot_13 : à tis_620 : เ unicode_escape : à ```
davidsarah commented 2010-06-17 21:56:18 +00:00
Author
Owner

#1089 discusses the use of non-UTF-8 encodings by FTP and SFTP clients.

#1089 discusses the use of non-UTF-8 encodings by FTP and SFTP clients.
davidsarah commented 2010-06-21 01:30:51 +00:00
Author
Owner

Replying to [davidsarah]comment:9:

Summary:

  • include UTF8 in the response to a FEAT request;
    [...]

Twisted's FTP implementation does not currently implement FEAT. However it is implemented in such a way that it's relatively easy to monkey-patch it to do so, and no more ugly than monkey-patching always is. Something like (untested):

def ftp_FEAT(self, arg=None):
    if not (hasattr(self, 'shell') and hasattr(self.shell, 'feat') and
            hasattr(self, 'sendLine')):
        log.msg("Assumption needed to monkey-patch FEAT support in Twisted "
                "does not hold", level=log.WEIRD)
        return defer.fail(ftp.CmdNotImplementedError('FEAT'))

    if arg is not None:
        return defer.fail(ftp.CmdSyntaxError('FEAT does not take any argument'))

    d = defer.maybeDeferred(self.shell.feat)
    def _reply(features):
        self.sendLine('211- Featuretastic!')
        for f in features:
            self.sendLine(' ' + f)
        return ftp.SYS_STATUS_OR_HELP_REPLY
    d.addCallback(_reply)
    return d

if not hasattr(ftp.FTP, 'ftp_FEAT'):
    ftp.FTP.ftp_FEAT = ftp_FEAT

class Handler...
    def feat(self):
        if self.encoding_is_utf8():
            return ['UTF8']
        else:
            return []
Replying to [davidsarah]comment:9: > Summary: > * include `UTF8` in the response to a `FEAT` request; > [...] [Twisted's FTP implementation](http://twistedmatrix.com/trac/browser/trunk/twisted/protocols/ftp.py) does not currently implement FEAT. However it is implemented in such a way that it's relatively easy to monkey-patch it to do so, and no more ugly than monkey-patching always is. Something like (untested): ``` def ftp_FEAT(self, arg=None): if not (hasattr(self, 'shell') and hasattr(self.shell, 'feat') and hasattr(self, 'sendLine')): log.msg("Assumption needed to monkey-patch FEAT support in Twisted " "does not hold", level=log.WEIRD) return defer.fail(ftp.CmdNotImplementedError('FEAT')) if arg is not None: return defer.fail(ftp.CmdSyntaxError('FEAT does not take any argument')) d = defer.maybeDeferred(self.shell.feat) def _reply(features): self.sendLine('211- Featuretastic!') for f in features: self.sendLine(' ' + f) return ftp.SYS_STATUS_OR_HELP_REPLY d.addCallback(_reply) return d if not hasattr(ftp.FTP, 'ftp_FEAT'): ftp.FTP.ftp_FEAT = ftp_FEAT class Handler... def feat(self): if self.encoding_is_utf8(): return ['UTF8'] else: return [] ```
tahoe-lafs modified the milestone from eventually to soon 2010-06-21 01:52:26 +00:00
zooko commented 2010-06-21 21:17:32 +00:00
Author
Owner

I opened http://twistedmatrix.com/trac/ticket/4515 (support the FTP FEAT request).

I opened <http://twistedmatrix.com/trac/ticket/4515> (support the FTP FEAT request).
tahoe-lafs changed title from FTP frontend should support Unicode filenames to FTP frontend should support Unicode filenames encoded as UTF-8 2011-02-02 23:50:46 +00:00
zooko commented 2012-12-28 06:32:01 +00:00
Author
Owner

Twisted #4515 has been closed.

[Twisted #4515](http://twistedmatrix.com/trac/ticket/4515) has been closed.
davidsarah commented 2012-12-28 23:52:45 +00:00
Author
Owner

Unfortunately the fix for that ticket isn't sufficient, because

adiroiban wrote in (@@http://twistedmatrix.com/trac/ticket/4515#comment:112458@@):

I don't plan to add IFTPShell.FEATURES in this patch since without UTF-8 support there will be nothing to export. Beside UTF-8 all other features (SIZE, MDTM, ect) are tied to the protocol.FTP implementation.

With this change, it is possible to declare support for UTF-8 by monkeypatching twisted.protocols.ftp.FTP.FEATURES, but that depends on an implementation detail, which is what we were trying to avoid. (Granted, it's a slightly less ugly monkeypatch.)

I don't know why adiroiban ignored me when I pointed out that the goal of that ticket could be achieved in a simpler way that would have been sufficient. Maybe I should have argued the case more strenuously.

Unfortunately the fix for that ticket isn't sufficient, because adiroiban wrote in (@@http://twistedmatrix.com/trac/ticket/4515#[comment:112458](/tahoe-lafs/trac-2024-07-25/issues/682#issuecomment-112458)@@): > I don't plan to add IFTPShell.FEATURES in this patch since without UTF-8 support there will be nothing to export. Beside UTF-8 all other features (SIZE, MDTM, ect) are tied to the protocol.FTP implementation. With this change, it is possible to declare support for UTF-8 by monkeypatching `twisted.protocols.ftp.FTP.FEATURES`, but that depends on an implementation detail, which is what we were trying to avoid. (Granted, it's a slightly less ugly monkeypatch.) I don't know why adiroiban ignored me when I pointed out that the goal of that ticket could be achieved in a simpler way that would have been sufficient. Maybe I should have argued the case more strenuously.
davidsarah commented 2012-12-29 00:01:34 +00:00
Author
Owner

Sigh, and it doesn't have a conformant implementation of OPTS:

def ftp_OPTS(self, option):
    """
    Handle OPTS command.

    http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00
    """
    return self.reply(OPTS_NOT_IMPLEMENTED, option)

http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 says:

2. UTF-8 Option

   The user issues the OPTS UTF-8 command to indicate its willingness to
   send and receive UTF-8 encoded pathnames over the control connection.
   Prior to sending this command, the user should not transmit UTF-8
   encoded pathnames.
Sigh, and it [doesn't have a conformant implementation of OPTS](http://twistedmatrix.com/trac/browser/branches/add-ftp-feat-4515/twisted/protocols/ftp.py?rev=35770#L1361): ``` def ftp_OPTS(self, option): """ Handle OPTS command. http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 """ return self.reply(OPTS_NOT_IMPLEMENTED, option) ``` <http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00> says: ``` 2. UTF-8 Option The user issues the OPTS UTF-8 command to indicate its willingness to send and receive UTF-8 encoded pathnames over the control connection. Prior to sending this command, the user should not transmit UTF-8 encoded pathnames. ```
tahoe-lafs added
code-frontend-ftp-sftp
and removed
code-frontend
labels 2014-12-02 19:41:19 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/trac-2024-07-25#682
No description provided.