Closed Bug 1563891 Opened 5 years ago Closed 4 years ago

No support for SMTPUTF8 (non-ASCII characters in the local part, RFC 6531)

Categories

(MailNews Core :: Networking: SMTP, enhancement, P5)

enhancement

Tracking

(thunderbird_esr78 wontfix)

RESOLVED FIXED
82 Branch
Tracking Status
thunderbird_esr78 --- wontfix

People

(Reporter: vesely, Assigned: gds)

References

()

Details

Attachments

(1 file, 9 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

Send mail to EAI address, e.g. mimì@example.com.

Actual results:

An alert popup saying "There are non-ASCII characters in the local part of the recipient address. This is not yet supported. Please change this address and try again."

Expected results:

If the outgoing mail server supports SMTPUTF8, just send the whole message as if it were all-ASCII. An alert is only due if no SMTPUTF8 shows up in the ehlo response.

See bug#127399, search "illegal". I think I need an option to skip checking for illegal localparts. It is debatable whether TB should check the ehlo response before skipping those checks. IMHO, a user should state "I want to send EAI". Then, TB skips the checks. If the server complains, the user needs to change server —not recipients.

Hmm, I didn't know of that old bug, a more recent variation, somewhat unrelated, is here, bug 1504526.

Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 1200050
Type: defect → enhancement

I replicated the bug by attempting to send mail to mimí@foá.it

Component: Untriaged → Networking: SMTP
Product: Thunderbird → MailNews Core

(In reply to Matt from comment #2)

I replicated the bug by attempting to send mail to mimí@foá.it

I see the error"There are non-ASCII characters in the local part of the recipient address. This is not yet supported. Please change this address and try again." when sending with that address (with non-ascii in both local and domain parts).

If I send tomimi@foá.it(removed non-ascii from local part) I see the error"An error occurred while sending mail. The mail server responded: 5.1.1 <mimi@xn--fo-nia.it> recipient rejected. Please check the message recipient "mimi@xn--fo-nia.it" and try again."

In the first case the mail is never sent to the server because tb sees the "uncorrectable" error in the local part. In the second case, I fixed the local part but the non-ascii remains in the domain part. According to the code, the non-ascii in the domain is fixed using "ACE" and sent to the server, but my server rejects it.

Here's the code where these actions occur:
https://searchfox.org/comm-central/rev/30fb8479949b5246fda13398de98216dcb3ad346/mailnews/compose/src/nsSmtpProtocol.cpp#340

Also, looking at the SMTP:5 log from my ISP's server, it doesn't mention SMTPUTF8 in the EHLO response or any other response. Not sure how common it is for SMTP servers to support SMTPUTF8. But maybe the reporter is suggesting that we just ignore the local part non-ascii and send the message and just let the server deal with it (like it deals with the domain part)?

Thanks for the research, Gene. What is ACE? Looks like the service comes from IDN (internationalized domain names) and the result is punycode. Oh, I see, Wikepedia says: Punycode sequences have a so-called ASCII Compatible Encoding (ACE).

Of course we send the domain as punycode.

All these terms are new to me: ACE, IDN, punycode etc. Only UTF8 I've heard of but had to look up details. But willing to learn something new.

(In reply to gene smith from comment #3)

If I send tomimi@foá.it(removed non-ascii from local part) I see the error"An error occurred while sending mail. The mail server responded: 5.1.1 <mimi@xn--fo-nia.it> recipient rejected. Please check the message recipient "mimi@xn--fo-nia.it" and try again."

That's not my domain. It's foá vs. foà, my domain has a grave accent. It is misconfigured at this time, so it should accept anything and then bounce it.

In the first case the mail is never sent to the server because tb sees the "uncorrectable" error in the local part. In the second case, I fixed the local part but the non-ascii remains in the domain part. According to the code, the non-ascii in the domain is fixed using "ACE" and sent to the server, but my server rejects it.

Here's the code where these actions occur:
https://searchfox.org/comm-central/rev/30fb8479949b5246fda13398de98216dcb3ad346/mailnews/compose/src/nsSmtpProtocol.cpp#340

I'd welcome an option that skips the whole thing, lines 324-371. I'm not clear how effective that snippet is, because in the second case, mimi@foà.it, the address in the header arrives in UTF-8, the same as I typed it. That is, it is not converted. Perhaps it gets converted in the RCPT command? Having non-ASCII in the To: header field is still non-compliant for non-SMTPUTF8. (BTW, the source display with Ctrl-U is garbled, but that's another bug.)

Also, looking at the SMTP:5 log from my ISP's server, it doesn't mention SMTPUTF8 in the EHLO response or any other response. Not sure how common it is for SMTP servers to support SMTPUTF8.

This may be incomplete, but conveys the trend https://en.wikipedia.org/wiki/International_email#Adoption

But maybe the reporter is suggesting that we just ignore the local part non-ascii and send the message and just let the server deal with it (like it deals with the domain part)?

Given the EAI option —which sooner or later will become the norm— both local part and domain name can be accepted as UTF-8 with no conversions. In order to support legacy servers, the domain name should be converted with IDNA, both in the RCPT command and in the To: header field. In any case, it is useless to tinker with the local part. I'm not sure the argument in bug#127399 ("One of the 200 recipients given has an illegal localpart. Happy searching!") is valid, because the server would reject specific RCPT commands, thereby locating invalid recipients one by one. Furthermore, if that check is meant t catch typos, depending on the keyboard at hand, it can only catch a percentage of possible key mishit.

(In reply to Jorg K (GMT+2) from comment #4)

Of course we send the domain as punycode.

Punycode is only needed for DNS lookup. During the SMTP chit-chat it is enough that both parties can handle, and accept, multi-byte characters.

OK, I sent to test@foà.it and that message went out. I got a bounce (as expected):

A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:
test@xn--fo-kia.it
host wmail.tana.it [62.94.243.226]
SMTP error from remote mail server after RCPT TO:<test@xn--fo-kia.it>:
513 Relaying denied.

In the sent message I see To: test@foà.it which is not garbled if I select Unicode as text encoding.

I'm not 100% sure what the code at
https://searchfox.org/comm-central/rev/30fb8479949b5246fda13398de98216dcb3ad346/mailnews/compose/src/nsSmtpProtocol.cpp#351
does, but by the looks of it, it converts the domain name to punycode. That's been the behaviour for ages, so I naïvely don't see anything wrong with that. Also, we're in a bug discussing the local part processing, not the domain name processing.

See Also: → idn-phishing

So should we just implement what was requested: If server advertises SMTPUTF8, send local part as UTF-8 without checking for ASCII and leave the domain processing as it is?

(In reply to Jorg K (GMT+2) from comment #7)

OK, I sent to test@foà.it and that message went out. I got a bounce (as expected):
(In reply to Jorg K (GMT+2) from comment #7)
In the sent message I see To: test@foà.it which is not garbled if I select Unicode as text encoding.

I see the error in the source (Ctrl-U) of both the bounce and the sent message. I cannot choose a charset on that window. Saved files and Copy Name and Email Address are ok.

I'm not 100% sure what the code at
https://searchfox.org/comm-central/rev/30fb8479949b5246fda13398de98216dcb3ad346/mailnews/compose/src/nsSmtpProtocol.cpp#351
does, but by the looks of it, it converts the domain name to punycode. That's been the behaviour for ages, so I naïvely don't see anything wrong with that.

The point is that you see To: test@foà.it, which is obviously not converted. Since that address should have been converted, there is something wrong in that piece of code.

Also, we're in a bug discussing the local part processing, not the domain name processing.

The code seems to be doing both things at once.

(In reply to Jorg K (GMT+2) from comment #8)

So should we just implement what was requested: If server advertises SMTPUTF8, send local part as UTF-8 without checking for ASCII and leave the domain processing as it is?

The difficulty seem to be to coordinate the object initialization, when the server response hasn't been seen yet, with the actual SMTP sending. Add to that the fact that if UTF-8 addresses are legal, senders who need to contact someone at an EAI address cannot send a message to an ASCII-only address instead. Therefore, I'd leave the choice to the sender. Do you plan to use EAI addresses? Set the option to true. Dunno, don't care? Set it to false. If the option is false, continue to half-convert domain names as usual and consider UTF-8 in the local part as illegal. Otherwise, skip all that block of code (assuming there are no other checks somewehre else).

(In reply to Alessandro Vesely from comment #9)

I cannot choose a charset on that window.

View > Text Encoding doesn't work?

The point is that you see To: test@foà.it, which is obviously not converted. Since that address should have been converted, there is something wrong in that piece of code.

The SMTP code were looking at seems to convert the domain for sending via the various internal objects it maintains. That doesn't mean we need to store it in the sent message as converted.

OK, I took the converter code out like so rv = NS_OK; // converter->ConvertUTF8toACE(domain, domain); and sent again. The message went out and got the same bounce, so I assume sending worked.

So looks like we can disable the code then the server has SMTPUTF8 capability. Gene, would you like to have a go at this? Alessandro, can you give us a test account at that domain.

Apparently changing the text encoding doesn't work for IMAP folders, see bug 1564094.

We already have bug 1259632 for SMTPUTF8 support on file. There are some suggestions made for implementing it. We'd have to read up on whether to send the domains in punycode even of the server supports SMTPUTF8 since it doesn't appear to make any difference of we do. But we need to drop the ASCII check on the local part.

Magnus, do you think we should allocate a resource to this if Gene doesn't want to do it?

Flags: needinfo?(mkmelin+mozilla)
See Also: → 1259632

(In reply to Jorg K (GMT+2) from comment #10)

(In reply to Alessandro Vesely from comment #9)

I cannot choose a charset on that window.

View > Text Encoding doesn't work?

That's already on "Unicode". Does it also affect th source window?

The point is that you see To: test@foà.it, which is obviously not converted. Since that address should have been converted, there is something wrong in that piece of code.

The SMTP code we're looking at seems to convert the domain for sending via the various internal objects it maintains. That doesn't mean we need to store it in the sent message as converted.

Right, sorry. I use BCC for the sent folder so I don't mind...

(In reply to Jorg K (GMT+2) from comment #13)

We already have bug 1259632 for SMTPUTF8 support on file.

I hadn't found it. So this bug is a duplicate of that one.

There are some suggestions made for implementing it. We'd have to read up on whether to send the domains in punycode even of the server supports SMTPUTF8 since it doesn't appear to make any difference of we do.

It may affect the display at the recipient's, though.

To avoid rfc2047 encoding in the Subject: and display names is another point.

(In reply to Jorg K (GMT+2) from comment #11)

Alessandro, can you give us a test account at that domain.

Sorry, I'm out of office at the moment, and don't have a form. Gmail has UTF-8 too:

ale@alenovo:~/tmp$ telnet alt1.gmail-smtp-in.l.google.com 25
Trying 108.177.14.27...
Connected to alt1.gmail-smtp-in.l.google.com.
Escape character is '^]'.
220 mx.google.com ESMTP s13si10684678ljg.53 - gsmtp
ehlo foo
250-mx.google.com at your service, [5.170.8.210]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
quit
221 2.0.0 closing connection s13si10684678ljg.53 - gsmtp
Connection closed by foreign host.

(In reply to Alessandro Vesely from comment #14)

View > Text Encoding doesn't work?

That's already on "Unicode". Does it also affect th source window?

Yes, you can set the encoding in the source windows and that works in TB 60.x.

(In reply to Jorg K (GMT+2) from comment #13)

Magnus, do you think we should allocate a resource to this if Gene doesn't want to do it?

I think not at the moment. It would be nice to tackle one day, but more as a part of a drive for email l18n.

Flags: needinfo?(mkmelin+mozilla)
Priority: -- → P5
See Also: → 1571672
Attached patch smtputf8.diff (obsolete) — Splinter Review

With this change you can now compose and send emails to an address like mimì@foà.it having any UTF8 character provided the smtp server EHLO response lists the capability SMTPUTF8. The address is not modified in any way, keeping and sending the UTF8 bytes as is.

When SMTPUTF8 is not listed (as with my usual server) you still get the error about the local part (string before the @) being uncorrectable if it contains non-ASCII UTF8. If the local part is pure ASCII, the domain part is still converted to punycode (using ACE converter) if it contains non-ASCII UTF8.

I had to move the code that loops through the recipient addresses since its current place is before the EHLO command is issued so it is unknown at that point if SMTPUTF8 is supported.

My changes are inspired by the suggestions here: https://rant.gulbrandsen.priv.no/eai+email.

There is still a problem when replying, forwarding or editing as new an existing email with non-ASCII UTF8 addresses. The address fields have the non-ASCII UTF8 replaced with the invalid/replacement character sequence as show here: Attachment 9156903 [details].
Edit: This is now fixed in Bug 1658361 and was caused by a regression as reporter Alessandro V. pointed out.

Edit: I probably also need to verify that the local and domain strings are truly UTF8. If not, local part need to be rejected. Not sure about domain part, maybe still convert it?

Alessandro, I noticed that after an email is sent with your smtp server, tb opens a new imap connection and then appends the just sent email to the "Sent" mailbox. Tb sends the ENABLE UTF8=ACCEPT for the new connection right before the append but the courier imap server responds with the alert anyhow. Otherwise, I never see the alert as long as "allow UTF8=ACCEPT" pref is false.
Edit: This also pertains to bug 1571672 comment 59 and is fixed there.

Assignee: nobody → gds

(In reply to gene smith from comment #20)

With this change you can now compose and send emails to an address like mimì@foà.it having any UTF8 character provided the smtp server EHLO response lists the capability SMTPUTF8. The address is not modified in any way, keeping and sending the UTF8 bytes as is.

Great! It works very well. FWIW, also DKIM signature covering "To: mimì@foà.it" is good.

Alessandro, I noticed that after an email is sent with your smtp server, tb opens a new imap connection and then appends the just sent email to the "Sent" mailbox. Tb sends the ENABLE UTF8=ACCEPT for the new connection right before the append but the courier imap server responds with the alert anyhow. Otherwise, I never see the alert as long as UTF8=ACCEPT pref is true. (This also pertains to bug 1571672.)

I'm sending this to Courier mailing list.

Summary: No support for SMTPUTF8 (non-ASCII characters in the local part) → No support for SMTPUTF8 (non-ASCII characters in the local part, RFC 6531)

(In reply to Alessandro Vesely from comment #21)

Alessandro, I noticed that after an email is sent with your smtp server, tb opens a new imap connection and then appends the just sent email to the "Sent" mailbox. Tb sends the ENABLE UTF8=ACCEPT for the new connection right before the append but the courier imap server responds with the alert anyhow. Otherwise, I never see the alert as long as UTF8=ACCEPT pref is true. (This also pertains to bug 1571672.)

I'm sending this to Courier mailing list.

Sam replied as follows:

APPEND with UTF8 uses a different syntax:

a ENABLE UTF8=ACCEPT
* ENABLED UTF8=ACCEPT
a OK Options enabled
a APPEND INBOX NIL NIL UTF8 (~{278}
+ OK
Received: from localhost (localhost [127.0.0.1])
 (forwarded by mimì@xn--fo-kia.it)
 by wmail.tana.it with local
 id 00000000005DC0B4.000000005F24DFE9.000027ED; Sat, 01 Aug 2020 05:22:17 +0200
Delivered-To: mimì@xn--fo-kia.it
To: mimì@foà.it
Subject: test

test
)
a OK [APPENDUID 398279247 18] APPEND Ok.

No warning messages.

I'll bet that Thunderbird sends ENABLE UTF8=ACCEPT, but does not use the UTF8 APPEND command.

Python didn't get it right at first, too:

https://bugs.python.org/issue34138

Fixed the append by copying the the python patch. Fixed in Bug 1571672.

(In reply to Alessandro Vesely from comment #17)

Created attachment 9156903 [details]
Daily 79.0a1 (2020-06-14) worsened the appearance of UTF-8 fields

See bug 1571672 comment 34

Now back to this. On reply, forward, edit-as-new, edit draft etc, any reflected address that contains non-ascii utf-8 bytes (between 0x7f and 0xff) are seen as invalid and appears at a ? inside a triangle. If the email is sent, these bytes are send as the 3 byte "replacement character" sequence 0xef, 0xbf, 0xbd (U+FFFD). But if at least one of the characters has a code-point greater than U+00FF, such as U+01FF or ǿ, the bytes are processed as char16's instead of bytes. With my fix for imap UTF8=ACCEPT (bug 1571672) in mozilla code, the "lost" upper bytes that were discarded are retained and the reflected address displays correctly and are sent correctly as UTF8. Without the UTF8=ACCEPT change in mozilla code, having code points greater then U+00FF in the address fails (reflects replacement chars or the wrong char) because the upper byte of any UTF-16 char are discarded in the mozilla code.

This fixes address chars with with at least 1 code point greater than U+00FF. It doesn't help when all code points are non-ascii less than U+01FF:
attachment 9166880 [details] [diff] [review]. The proposed changes to mozilla code are described here: bug 1571672 comment 51.

Anyhow, I'm kind of stuck here and don't know where these "reflected" address are written. Even if I did, not sure that would help.

Attached patch smtputf8-v2.diff (obsolete) — Splinter Review

Ok, after some research, debugging and trial and error I now have a possible fix. When stored recipient addresses are reflected, e.g., on edit-as-new, edit draft etc, the .js function convert8BitHeader() occurs here: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684.

For each recipient address the function is called twice. On the first call the parameter headerValue is the uint16_t string with the lower byte containing the UTF-8 byte(s) corresponding to each character of the address string; the upper bytes are all zero. The string is converted to a byte array of UTF-8 and is "decoded" back into headerValue as UTF-16 and headerValue is returned. So, at this point on return of the first call, when headerValue is looked at with the built-in .js debugger, it is correct (no corrupted chars).

Before the next call to convert8BitHeader(), the headerValue string has the upper byte zeroed. This only affects characters with code point U+01FF or greater and occurs here: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mozilla/js/src/jsapi.cpp#4532. I have a fix to avoid the losses as part of the UTF8=ACCEPT fix that is also needed here to completely fix the reflection bug for these higher code points.

Anyhow, for code points less than U+01FF, the 2nd call to convert8BitHeader() sees the value of headerValue returned from the first call passed back in. So now the headerValue is a UTF-16 string. The decoding assumes that headerValue consists of UTF-8 bytes so when it is "decoded as UTF-8", any non-ascii UTF-16 values are changed to "replacement" characters U+FFFD (the ? in a diamond).

The fix for this in convert8BigHeader() requires saving the passed in headerValue in keepInString and then attempt to decode headerValue as UTF-8 same as always. But instead of just returning headerValue, first check if it now contains replacement chars. If it does, I then create a "typed" byte array from the headerValue. If the "typed" string length has not changed I then check if a lower byte has changed to 0xFD (typed string ignores the upper bytes) for any bytes. If so, I now just return the saved keepInString which is still the correctly encode UTF-16 string.

(TBD: Maybe just return keepInString if I detect the replacement char in the decoded string and skip other checks.)

With this change, here are some examples of addresses that are reflected correctly with no corrupted/replacement symbols:

jøran@blåbærsyltetøy.gulbrandsen.priv.no
jøran@blåbærsyltetøy.gulbrandsen.privý.no

In the above examples, each non-ascii char is at code point less than U+01FF so possible truncation of the upper byte by mozilla code (jsapi.cpp) is not an issue.

But without the change to avoid losses of upper UTF-16 bytes in mozilla code, this address
jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ
is reflected as
jøran@blåbærsyltetøy.gulbrandsen.priv.nÿ
because ǿ (U+01FF) is truncated to ÿ (U+00FF).

Attachment #9167462 - Attachment is obsolete: true

Gene, this line doesn't work:
https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mozilla/js/src/jsapi.cpp#4532.
It doesn't work since comm-central/ ... /mozilla is not a valid repo. You mean this:
https://searchfox.org/mozilla-central/rev/491612fa0be0f809069ea62c6316bf452cacc816/js/src/jsapi.cpp#4532

Before the next call to convert8BitHeader(), the headerValue string has the upper byte zeroed.

So why is that upper byte zeroed? I think you need to look for any interface which is declared as string/ACString. That needs to change to wstring/AString or to AUTF8String, like here https://hg.mozilla.org/comm-central/rev/23725f858c42 where we introduced unicode passwords.

I think the string processing here:
https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/compose/src/nsSmtpUrl.cpp#594
nsSmtpUrl::GetRecipients(char** aRecipientsList) needs to change. The definition is here:

https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/compose/public/nsISmtpUrl.idl#27
attribute string recipients;
Make what a AUTF8String and Bob should be your uncle ;-) - I think, the sender below should receive the same treatment.

I guess this one will have to change, too:
https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/base/public/nsIMsgHdr.idl#70
So you need to adjust all C++ code around these calls:
https://searchfox.org/comm-central/search?q=GetRecipients%28&path=&case=false&regexp=false
Let me know if you want a hand with that.

Hmm, looking at this again, perhaps the change in nsIMsgHdr.idl#70 isn't necessary, since if you do, you have a huge ripple effect into the author, ccList and bccList attributes. So start with nsISmtpUrl.idl#27 and maybe nsISmtpUrl.idl#47. Looks like GetSender() is only used here:
https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/compose/src/nsSmtpProtocol.cpp#721
so that's a simple change.

This is my proposal for the string changes. I let it ripple up a bit to promote smart string goodness as far as reasonable.

Thanks for looking at this Jorg. Magnus had suggested a change to AUT8String at one interface and I tried it and it made no difference. Later he asking something about "attributes" but never responded to my following up question so I just kept trying to find the reason this didn't work.
Obviously, you and others know a lot more about this than I do so probably been just wasting my time the last few weeks or (maybe months, haven't tracked my time) working on this and the other issue of UTF8=ACCEPT in imap folder names.

Anyhow, I don't know if you have tested the above patch so I'll give it a try. I'll assume I don't need the changes that I proposed and remove them and try it with just your patch (which I don't instantly comprehend).

Well, the zeroing out is exactly from the JS engine and the faulty interface definition. I don't know what you changed, but I suggest to rebase your patch only mine and try again. I can also look into it if you wish. (The changes in JS Mime are not acceptable, as you will understand.)

Yes, I changed the attribute recipients/sender and those changes rippled up. I let them ripple up further than necessary, as I said on comment #28 since we want to reduce the use of raw strings.

I "effectively" removed my changes and applied your patch. Don't see any difference and bug still occurs like before (replacement chars in reflected recipient addresses, points equal to or greater than U+01FF have upper byte masked to zero). Then I put my changes back along side your patch and it's fixed again. But I'll do as you suggest and shelve all my changes (including the mozilla code, UTF8=ACCEPT and SMTPUTF8 changes) back to pristine trunk and apply your patch to make sure I'm not somehow breaking it. It'll probably take a while to build so I'll report back what I see when it finishes...

Ok, it took a while to build but I don't see any difference.

Just to be clear, the upper byte getting masked off here
https://searchfox.org/mozilla-central/rev/491612fa0be0f809069ea62c6316bf452cacc816/js/src/jsapi.cpp#4532
is not the only problem. The changes I did here are the primary fix: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684
But maybe there are other interfaces that need fixed?

I'm a bit confused by "my changes" and "your patch". My patch just should just ensure that UTF-8 data passing through the interfaces doesn't get truncated down to ASCII (7bit) or Latin-1 (8bit, single byte). I don't quite understand what U+01FF is about. I mean, after 0xFF you have 0x100, right? OK, U+01FF is ǿ, and U+00FF is that ÿ in UTF-8. So then the issue is not ASCII/Latin-1 vs. UTF-8, but rather string vs. wide string. If you look at comment #25, I also mentioned wstring/AString. Maybe the interface needs to change to AString and not AUTF8String since JS strings are NOT UTF-8 encoded, they are wide strings in UTF-16. Refer to what was done for the unicode passwords: https://hg.mozilla.org/comm-central/rev/23725f858c42

Before the next call to convert8BitHeader(), the headerValue string has the upper byte zeroed. This only affects characters with code point U+01FF or greater and occurs here ...

OK, but where does that happen in C-C code? Through which API does that headerValue string pass? Where does it get lost in C-C code. You could do a stack trace with a breakpoint at jsapi.cpp#4532. And also get the JS stack using DumpJSStack() which you can evaluate in the C++ debugger, at least I can on Windows.

I'm happy to look at it if you rebase your patch on top of mine. Maybe you could also add some debug to show the loss of the data. As I pointed out in comment #26, those headers are also passed around in inferfaces from nsIMsgHdr.idl. When I search for headerValue, I see code like this: this.mDummyMsgHeader.author = header.headerValue. That's problematic since you may be assigning a "regular" JS (wide UTF-16) string to a "narrow" string. And if you have to change nsIMsgHdr.idl, you'll have a HUGE ripple on effect, and I don't know whether this is even possible since those header values are stored in the somewhat limited Mork database backend. OK, that can generally store UTF-8 OK, but you'd have to be very careful to convert JS string to UTF-8 at the right moment ... and back. All this looks like a small research project ;-)

Also, surely, you will need your changes in nsSmtpProtocol.cpp/h. Sadly, trunk is busted right now, see the tree closure for details.

Attached patch jorg-patch-and-my-patch.diff (obsolete) — Splinter Review

Not sure if this is what you want, but here is your patch and my SMTPUTF8 and jsmime.js fixes combined. This is with mozilla and comm-central at these tip points:

mozilla: changeset:   542131:f4703bddd567
comm: changeset:   30212:76593bdd0358

This doesn't include any changes to mozilla code.

Since you say the current trunk is broken and I haven't updated past the above points (about July 26th), you should be able to update/checkout to the above revisions, apply the patch and duplicate my build.

However, I just noticed that if I run this with 68, the non-ascii recipients are reflected correctly with no problems! I should have read the reporter's comment 17 and the referenced comment to see that this occurred sometime between 68 release and early June on the daily build. Not sure if it affects 78.

So I really should check to see what's changed that is causing this instead of tweaking code.

So I really should check to see what's changed that is causing this instead of tweaking code.

Doing some test from archives, the latest 68.11 is OK but latest 78 version has the problem. Then doing some bisecting on daily archives I found that the problem occurred betwen 2019-12-11 and 2019-12-12. From the archived .txt file, here are the hg versions:

12-11: good (non-ascii chars reflect correctly)
20191211104528
https://hg.mozilla.org/comm-central/rev/df3dbb04dd384e4c8fd7a7e01f28bde758646bb2
https://hg.mozilla.org/mozilla-central/rev/f83f2771414cf5938a46f44e1b11cdcd5181ea0f

12-12: bad (non-ascii chars reflect as replacement symbols)
20191212095840
https://hg.mozilla.org/comm-central/rev/019eaf0f9cf72621a436ce06b2607bc5a3532fae
https://hg.mozilla.org/mozilla-central/rev/192e0e33eb597e8d923eb89f6d49bf42654e9d11

I checked the diff for comm and mozilla central and I really can't tell anything as to what caused the problem to start occurring on Dec 12th. Both have LOTS of changes on that day. There does seem to be a lot of activity for "pills" which I think might be the term for the new recipient display containers, so this might be relevant.

Does TB's JavaScript use any UTF-8 conversion at all? For example, npm's utf8. Is it too late?

Sooner or later JavaScript will have to change its internal character encoding. I see such ideas for Java but found none for JS.

Not sure if this is what you want, ...

I wanted your patch as a second patch based on mine, and also with the JS Mime changes removed, so the problem becomes evident.

Here are the ranges:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=df3dbb04dd384e4c8fd7a7e01f28bde758646bb2&tochange=019eaf0f9cf72621a436ce06b2607bc5a3532fae

I don't understand comment #35: Are you saying that the new pill code from bug 440377 also impacts on being able to enter non-ASCII characters for recipients? That would be very bad and another bug we should fix separately. That pill code is all in JS and it's highly likely that it hits an API that can't handle non-ASCII input.

EDIT: So how do I reproduce that issue? Entering jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ and saving a draft doesn't seem to expose any issue.

I wanted your patch as a second patch based on mine, and also with the JS Mime changes removed, so the problem becomes evident.

OK, this removes my change to jsmime.js, keeps my SMTPUTF8 changes and merges in your patch.

Here are the ranges:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f83f2771414cf5938a46f44e1b11cdcd5181ea0f&tochange=192e0e33eb597e8d923eb89f6d49bf42654e9d11
https://hg.mozilla.org/comm-central/pushloghtml?fromchange=df3dbb04dd384e4c8fd7a7e01f28bde758646bb2&tochange=019eaf0f9cf72621a436ce06b2607bc5a3532fae

Edit: Looking through these, other than maybe the "pill" push, I don't see anything the jumps out. But even within the "pill" push, I don't see anything about char encoding or UTF* etc. There are hundreds of mozilla changes in that one day and I mostly just looked at the summaries.

I don't understand comment #35: Are you saying that the new pill code from bug 440377 also impacts on being able to enter non-ASCII characters for recipients? That would be very bad and another bug we should fix separately. That pill code is all in JS and it's highly likely that it hits an API that can't handle non-ASCII input.

No idea. Never heard of a "pill" before looking at this code. So any effect it might have is just an hypothesis.
Edit: "Entering" the non-ascii recipient address is not the problem. Also, once entered it displays OK on the "thread" pane and in the "reading" pane. The problem only occurs when the address is "reflected" on the "Write" window.

EDIT: So how do I reproduce that issue? Entering jøran@blåbærsyltetøy.gulbrandsen.priv.nǿ and saving a draft doesn't seem to expose any issue.

Once you have saved the draft you then have to edit the draft or "edit as new" the saved draft. Only then do you see the bad chars. This is what I mean by "reflected" in my previous comments. I tried it again on Dec 11 daily and it works OK (reflects properly when "Write" window appears). With Dec 12 archive it reflects with "replacement chars". In both cases the address looks OK before you edit the draft. So the problem with bad chars only occurs when the "Write/Compose" window appears.
Also, this really has nothing to do with the SMTPUTF8 change since that only comes in to play when the message is actually sent. So the problem occurs on any daily version since Dec 12 and on 78 releases. So really my attached diff is not needed.

Attachment #9169082 - Attachment is obsolete: true

OK, thanks. I'll file a separate bug for that when I'm back at my desk. FYI, the recipient pills are the new way of entering recipients.

Depends on: 1658361

OK, filed bug 1658361. I guess that needs to be fixed first. It's related to sending, since saving a draft using the send pipeline, but not SMTP.

(In reply to Alessandro Vesely from comment #36)

Does TB's JavaScript use any UTF-8 conversion at all? For example, npm's utf8. Is it too late?

Sooner or later JavaScript will have to change its internal character encoding. I see such ideas for Java but found none for JS.

I could be wrong but "TB's JavaScript" is actually the JS implemented mostly in the mozilla (firefox) specific code. I have read that JS specification requires the use of UTF-16 internally. Given that, at least one TB specific JS function does convert UTF-8 input to UTF-16 for internal use. It is the function I had to "tweak" to "fix" the reflection bug: https://searchfox.org/comm-central/rev/085526b22865ea90738d779fa14bf711d46f69d3/mailnews/mime/jsmime/jsmime.js#684. The root of the problem here is that it gets called twice. The first call it receives the correct address string as UTF-8 and it get convert properly to UTF-16. On the second call, it receives the address as the just converted UTF-16. It is designed to convert from UTF-8 so it sees the UTF-16 input as invalid for any non-ASCII chars. This is not a problem for ascii addresses since for ascii, numerically UTF8 == UTF16 == ASCII so the 2nd call doesn't cause a problem. Why it gets called twice may be the problem!

Comment on attachment 9169034 [details] [diff] [review]
1563891-recipients-sender-AUTF8String.patch (now formatted)

Not needed, issue fixed in bug 1658361.

Attachment #9169034 - Attachment is obsolete: true
Attachment #9169194 - Attachment is obsolete: true
Attachment #9169025 - Attachment is obsolete: true
Attachment #9167462 - Attachment is obsolete: false
Attached patch 1563891-support-SMTPUTF8.patch (obsolete) — Splinter Review

This cleaned up smtputf8.diff: attachment 9167462 [details] [diff] [review]

As mentioned before this adds a new capability value SMTP_EHLO_SMTPUTF8_ENABLED. There currently exists a similar sounding capability called SMTP_EHLO_8BIT_ENABLED. This actually refers to the 8BITMIME capability. So to distinguish them and to make the name more specific to its purpose, I renamed SMTP_EHLO_8BIT_ENABLED to SMTP_EHLO_8BITMIME_ENABLED.

This also adds a change to the error string. The same change is needed for suite but it can be added in a later patch iteration.

Attachment #9156903 - Attachment is obsolete: true
Attachment #9167462 - Attachment is obsolete: true
Attachment #9173541 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9173541 [details] [diff] [review]
1563891-support-SMTPUTF8.patch

Review of attachment 9173541 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, this looks good I think.
Did you send it for a try run yet?

::: mailnews/compose/src/nsSmtpProtocol.cpp
@@ +725,5 @@
>  
>    m_nextStateAfterResponse = SMTP_SEND_MAIL_RESPONSE;
>    SetFlag(SMTP_PAUSE_FOR_READ);
>  
> +  // This block moved from above to here where SMTPUTF8 capability is known.

This comment will be confusing to future readers. Maybe phrase it like

Now that we know whether SMTPUTF8 capability is available...
Attachment #9173541 - Flags: review?(mkmelin+mozilla) → review+
Status: NEW → ASSIGNED

Fixed the comment about "block moved".

Attachment #9173541 - Attachment is obsolete: true
Attachment #9175108 - Flags: review?(mkmelin+mozilla)

Did you send it for a try run yet?

I'll do that now. Sorry, I missed this question this before attaching the patch.

Comment on attachment 9175108 [details] [diff] [review]
1563891-support-SMTPUTF8-v2.patch

Setting patch obsolete. Looks like there is at least one relevant unit test failure. I'll check it closer tomorrow.

Attachment #9175108 - Attachment is obsolete: true
Attachment #9175108 - Flags: review?(mkmelin+mozilla)

Just noticed that suite strings need updated too. Are we still updating suite strings to match tb?

Might as well... though it's not like anyone will ever see them in practice :/

See Also: → 1664733

I moved the check for bad addresss up some right before the smtp send of "MAIL FROM" since there is no point in sending other SMTP commands, except HELO/EHLO and QUIT, if there are bad addresses.

After send button is clicked and there are bad addresses two pop-ups appear: an "alert" that you have bad addresses and, typically hidden below it, a connection status. If the alert is dismissed the status pop-up also goes away and you can edit the bad addresses. If you dismiss the status pop-up first, the alert remains and it can also be dismissed but then you see another pop-up saying the connection was lost during the send, try again. Then you can edit the addresses. But in both cases the message you are trying to send gets copied to Sent folder (a.k.a., FCC) even though the message was never sent.

To fix this I added a new variable m_DataCommandWasSent. Since the bad addresses are detected before the smtp DATA command is sent and the QUIT command still occurs after pop-ups are dismissed but OnStopRequest() typcially receives an NS_ERROR_ABORT so FCC is allowed here https://searchfox.org/comm-central/rev/67ff5d7aa78c03b51b1ccd06b8636f3ac28e5fbf/mailnews/compose/src/nsSmtpProtocol.cpp#510. If there are address problems, we don't want to ignore the error so that FCC is still prevented. With no address errors and after DATA command is sent, it's OK to ignore the QUIT error and allow FCC to occur.

This was not a problem with the original code because the address validity checks occurred before any SMTP activity even occurred. Now the smtp EHLO command must occur to know about the capabilities.

The unit test also needed some fixes. It does tests by sending 4 messages. Two tests passed as is: sending a normal correct ascii-only address message and sending a message with ascii local part and non-ascii domain part. Neither of these trigger the address validity check that is now moved in the main code to after the EHLO response. The test failure occurred for the 3rd and 4th test where the address has non-ascii in the local part, e.g., skrä@test.com, and where the address only has a local part (no @). Both cases trigger the alert and subsequent SMTP commands never occur and the test was expecting them (now only command EHLO occurs). My solution is to only expect EHLO for these tests and terminate these tests (call do_test_finished()) after detecting EHLO. This avoids subsequent timeout errors.

This was also not a problem with the original code because test 3 and 4 never caused any SMTP command to occur, not even EHLO. So after detecting the alert, both tests terminated normally since no SMTP activity occurred.

Note: the smtp fakeserver doesn't support SMTPUTF8 so unit tests don't address this. I have not (yet) looked into adding this capability to the server and unit tests.

Attachment #9175667 - Flags: review?(mkmelin+mozilla)
Comment on attachment 9175667 [details] [diff] [review]
1563891-support-SMTPUTF8-v3.patch

Review of attachment 9175667 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thx Gene! r=mkmelin
Attachment #9175667 - Flags: review?(mkmelin+mozilla) → review+
Target Milestone: --- → 82 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/5a86d10d79c2
Add support for SMTPUTF8 (allow valid UTF-8 characters in recipient addresses, RFC 6531) r=mkmelin DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/8e098fd90170
followup to fix linting. rs=eslint DONTBUILD

Don't know why test_sendMailAddressIDN.js becomes unstable on my local, if I run it ten times, about five times will fail. It seems there are two places that can trigger onStopSending:

  1. https://searchfox.org/comm-central/source/mailnews/compose/src/nsSmtpProtocol.cpp#2113
  2. https://searchfox.org/comm-central/source/mailnews/base/src/nsMsgProtocol.cpp#393

When nsMsgProtocol::OnStopRequest is called first, the aStatus will be 0, and the test will fail.

test_sendMailAddressIDN.js seems to fail consistently on macOS, after checking https://treeherder.mozilla.org/#/jobs?repo=comm-central&selectedTaskRun=QPOumZdRTzKT9EX6AP1zjw.0 and all following builds.

Regressions: 1665652

Filed bug 1665652 for that.

(In reply to Arthur K. [He/Him] from comment #58)

This appear to have caused bug 1710224. See https://bugzilla.mozilla.org/show_bug.cgi?id=1710224#c43

I see the "You are spamming" error with 78.8.1 which I don't think contains this SMTPUTF8 feature based on target milestone above (between comment 51 and comment 52).

The "You are spamming" response of comment 59 to "RCPT TO" is probably just due to my ISP's IP address range being on a blacklist. After further testing using openssl command line method, it appears that the SMTPUTF8 parameter on the "MAIL TO" command sent by TB is causing the problem with "Sorry, this is spam" response after sending the DATA (see bug 1710224 comment 67). However, sending this should be OK and causes no problems for other servers that also support SMTPUTF8 like gmail and courier (Alessandro's tana.it). Why "Easyname" smtp server is seeing the message as spam when the SMTPUTF8 parameter is present in the "MAIL TO" command is not yet known but seems to be a server bug.

See Also: → 1726106
You need to log in before you can comment on or make changes to this bug.