Sending E-mails
Hint: Hover the mouse pointer over the left of any of the following headers displays an anchor link, which you can click on to acquire a link to that exact header.
And another reminder that you will want to read
my other write-up
since that is the closest to a complete walkthrough I have.
Skip the Perl section, but make sure you get to
telnet
and openssl s_client.
And of course: I should mention that SMTP isn't the full story. You need a way to receive incoming emails as much as you want to compose outgoing emails; the former is the concern of mail access protocols such as POP/IMAP and isn't really covered here... for two reasons. One, it hurts my finger trying to type the tags (IMAP is a relatively more modern protocol so I guess that's partly why); and two, I don't get it. But it makes sense: a write-only protocol like SMTP (if we ignore commands like VRFY) is bound to be less convoluted than a read-or-write protocol like IMAP. (In that sense SMTP really does live up to the S in its name, which is a good thing!) But yeah. While SMTP is part of an email client, you can't practically use email like this in that you're simply missing out on IMAP.
A few tips
There are two important parts to understanding how we send emails: the format (governs the structure, syntax, and encoding/decoding of the email message) and the protocol (a mutual contract between the client and the server).- The protocol, known as SMTP (Simple Mail Transfer Protocol), involes talking to a mail relay server and queuing a message for delivery. (It's like dropping off your package at a post office, or UPS Store... well, I've only done it once at a UPS Store to return something from Amazon.)
- The format consists of two parts: the headers and the body. Exactly which one is which should be pretty evident: everything before the first blank line is the headers, and everything after, the body. (Need an example that isn't as convoluted as the ones you get from Gmail/Outlook? Read what I wrote between DATA and that single period in the telnet section, the one with alice and bob, to be clear.)
- The one that is most obvious is SMTP, which you can just Google and find that it's a chain of RFCs that started with RFC 821 (There are a few others that are concerned with authentication, extensions... but RFC 821 (and its successors) will be the primary reference. We do care about authentication, which is covered in another RFC (and will be the thing I will cover in gruesome details), but that's the only "other" thing we care about.
- The headers may be a bit more elusive. Over time, email headers extended well beyond the use within email messages (possibly because they are so successful in effectively representing key-value pairs of metadata), so you may, say, see the same headers show up in an HTTP response. The one, true reference of email headers, however, remains to be RFC 822 and its sucessors (RFC 2822 and RFC 5322). Now, I wanted to emphasize this because from the title it may not be very convincing these are the right standards: after all, the title of RFC 822 says "Standard for the format of ARPA network text messages"? What does email have to do with ARPANET messages? Well long story short, ARPANET was the first network of its scale and... in some sense you can call it the "precursor" of the Internet we know today. (One thing to note about RFC 822 is that you do want to include the full, four-digit number for a year, since we're in a new century and it is now necessary to differentiate, say, 2025 from 1925. I think there is a section in RFC 2822 that defines the Date specification format (and if my eyes don't deceive me, RFC 5322 uses the exact same date); make sure to use that instead of the one given in RFC 822.)
- The content of an email is the most
interesting part to me IMO. But to understand why it is the way it is,
you should know that SMTP was used not just over TCP/IP; ASCII
computers were prevalent, but not universal (EBCDIC was still a thing.)
So not only do the Internet committees have to be precise about the
ASCII characters and the exact byte code it uses, it also has to
be wary about protecting "unsafe" characters (like trailing spaces)
and escaping "illegal" non-ASCII characters (ISO-8859-* and UTF-8
charset extensions, or even binary files like a PNG image or a
tarball which may contain arbitrary bytes -- including the "meta"
control characters!) For a few decades, emails just flat-out didn't
allow any of the "unsafe" and "illegal" characters to be transported.
That was, until MIME came around: RFC 2045 (Part 1) through
RFC 2049 (Part 5) define the so-called multi-media Internet
mail "extensions" to the original ARPANET emails, which:
- Define mechanisms to escape the aforementioned "unsafe" and "illegal" characters (MIME Part I), namely a lesser known "Quoted-Printable" encoding (which you will find some resemblance with URI percent encoding), and Base64 (yes, that Base64 encoding!)
- Define standards for naming media types (MIME Part 2)
as well as ways to register new media types (MIME Part 4); text media
types, in addition to such a type name, also has a character set (charset,
which is essentially a map from the raw bytes to characters of our human
language.) Emails that do not use MIME up until this point, for example,
have the following implicit MIME type, charset, and transport encoding:
Note that there is always a MIME-Version of 1.0 for every multi-media email message. We might as well ignore it, since it wasn't so easy for the creators of MIME to "update" the specs (see here for Nathaniel Borenstein, one of the creators of MIME, giving his words about it). But it's usually there (by standards it MUST be)... and while it is functionally devoid of any practical use, I like to think of it as a reminder of what it is that empowers us to do what we can with emails. :)MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7-bit
A numbered list (<ol>) is used so you can
reference these later. For anchors, append a dash and the number
at the end. (For sublists, append another dash and the alphabet,
e.g. #tips-3-a.)
A brief overview
Before we begin, it is necessary to emphasize that SMTP (and most, if not all Internet protocols) only recognize CRLF as the new line sequence. If you are on DOS/Windows this is probably the only line ending you've been using your whole life, but it's something to pay attention to in Unix/Linux/modern MacOS (we do not talk about OSX when it used CR. That's just sheer insanity.)
Telnet works because the telnet client abides by the telnet protocol
(RFC 854), which uses CRLF. But openssl s_client
will not work without the -crlf flag.
At least, if you run the command on Linux/Unix, it would be sending
exactly what your terminal sends to its standard input: that being
a lone Unix LF. The situation is exactly the opposite in the Windows
terminal: a -crlf is precisely not what you want, since
MS-DOS has long traditionally been using the same CRLF sequence
Internet protocols speak in.
In general what an SMTP client does to queue an email are the following three steps:
- An initial handshake after a connection is established: HELO or EHLO. (Note that once plain text connection is upgraded to TLS using STARTTLS, this handshake is still necessary.)
- Authentication with a SASL mechanism using AUTH <mechanism>. (SASL being short for "Simple Authentication and Security Layer", though I find the full name of acronym not so enlightening.)
- Originator address using MAIL FROM and destination addresses using RCPT TO. (Note that if you want to use the 8BITMIME capability to send raw Unicode text, you do it after the MAIL FROM address (the one enclosed in angle brackets). See RFC 6152 or my example in telnet section.)
- Email message with DATA. (And no, in case you were reading RFC 2045 and want to know the "binary" CTE, I will not talk about BDAT. If you are curious, read RFC 3030, but realize that sending raw bytes defeats the whole point of SMTP being a line-oriented text protocol you can directly talk to directly with a keyboard.)
You can tell from the highlight what this will be all about. Because I have introduced others parts already in the write-up I've linked to too many times, the focus will be just SASL mechanisms.
SMTP servers typically listen on one of the three TCP ports. On a Unix machine, this information can be looked up in /etc/services (which by extension, means you can find under the same path that file on Linux, macOS, and (if you are lucky enough) on Windows under C:\Windows\System32\drivers\etc\services.)
- SMTP, in its raw form, on port 25,
- SMTP, encrypted over SSL/TLS, on port 465 (basically, SMTPS is to SMTP what HTTPS is to HTTP),
- SMTP, initially connected in its raw form, but subsequently upgraded to TLS using STARTTLS, on port 587. (Sometimes called the submissions port for reasons I don't really understand.)
People don't really use port 25 because, well, it's plain text. The upside that you don't need any special software to read it also being its downside: anyone can read it even when they're not supposed to. On CSL, being logged in already "authenticates" you in a sense, so that's one instance where port 25 does get used. Everywhere else, it's either 587 or 465.
By the way, I don't really have a recommendation for one port over another. Just use whichever you are lucky enough to use. :)
We will use following commands to connect to SMTP, while ensuring newlines are properly translated to the Internet CRLF:
# Port 25 SMTP
telnet hostname 25 # telnet: already uses CRLF per protocol
nc -C hostname 25 # netcat: -C means translate LF to CRLF
# Port 465 SMTPS
openssl s_client -nocommands -crlf -connect hostname:465
# Port 587 STARTTLS
openssl s_client -nocommands -crlf \
-starttls smtp -connect hostname:587
(-nocommands disables special one-letter commands that
trigger certain TCP-layer stuff. I'm sure you won't like your connection
getting all weird when you type the "R" in "RCPT TO", so disable it.
-connect is entirely optional, but you can keep it
if you think it reads off nicely.)
And once again: if you are on Windows, do not add the
-crlf.
Sending without authentication (warm-up)
Adding this part after I have finished writing
how to do that with AUTH PLAIN... so
follow that but replace "openssl s_client" with "telnet localhost 25"
and skip AUTH PLAIN (obviously, duh. That's the whole point explained
in the title.) For this to work you also need to be logged into the CSL machine (that is,
you have connected using "ssh CSL_USER@best-linux.cs.wisc.edu", or
something like that).
Because there isn't any credentials I need to hide here, I also made an asciinema recording of me doing it! You can watch that to get an idea of how it all works. Click here and it will take it to asciinema.org. (I cannot embed it here because CORS is too complicated :c)
Here is the YouTube video that sparked my interest in all of this. Of course, it doesn't cover what MIME and SASL authentication were; the rest I figured out myself by reading the same docs I am giving to you. But it is interactive and more picture-rich... so there is the advantage. :)
Sending as @cs.wisc.edu and @gmail.com
This page explains how to send as @cs.wisc.edu and @gmail.com. You should work through it before trying XOAUTH2! There is a "challenge" at the end where you can see if you worked though everything correctly. Note that you won't be able to read your UW email this way since your @cs.wisc.edu address simply forwards to the @wisc.edu inbox. You can with Gmail since it's IMAP/SMTP in one piece.
Sending as @wisc.edu
This page explains how to send as @wisc.edu. The focus will be about retrieving an OAUTH access token and refresh token using UvA-FNWI/M365-IMAP. You will also be able to receive emails this way, using the same access token. I will assume you know how to encode quirky strings with ASCII control characters that normally cannot be typed on a keyboard to Base64.