simple mail parser for extra headers?
hello other RSA customers and RSA people,
don't really know much about writing custom packet decoder parsers or want to engage PS but wondering what other people have done for extracting custom mail headers.
Ideally we'd want to run it alongside the RSA mail or rather SMTP parser not cloning customer_custom_smtp_lua
Basically we want to parse 2 things:
a) texty headers from https://www.ietf.org/rfc/rfc2822.txt - e.g. message-id/ References:/ In-Reply-To:
(these don't seem environment specific, perhaps they can become part of the standard parser ? )
b) value headers passed by mail gateway to our mail server
i) simple flag headers : X-ExecAttachment: True
ii) texty type headers: X-MailSandbox-StatusOrVerdict: e.g. unknown,pending, malicious, etc.
Wondering what other people have done for similar problems parser and meta key wise.
A couple of things to check out:
mail_lua options file (you can get to the options file on the decoder once you deploy - DO NOT SUBSCRIBE TO THIS FILE - from the Decoder > config > files )
there are a number of functions you can change
check for ones that end with return true (for enabled)
you can also check out the x-factor parser that might be able to get you what you want
or this one for the spf headers
that helps,at least for the X- parts:
a) unfortunately nothing about x- headers in mail options, at least the lua version we have (we already use the src/dst option out of mail_options_lua set to true... ps yes I don't really get why it's in live as luax at all given half the point is editing the true falses.)
b) flex parser (or native) ? thought lua only from now (or doesn't matter as long as you're not deploying both native and lua and double parsing?)? overall the aggregate advice from peeking at the spf+xfactor blogs is helpful but a few more questions
overall it somewhat fits our need (can at least 'quick and dirty' parse out the specific X-xxx headers if we don't want X-everything like in the blog.) regardless of the parser format - 2 things bother me:
i) doesn't seem service 25 specific. any way to make it so?(basically want it as specific as possible to avoid unintended side effects. and if we want it for 80 later on we can make that decision )
ii) perhaps even more specific to confine it to the header section (prior to the message body, although something tells me that'd no longer make it a simple parser and I should be quiet and happy and try it out as is )
this sort of helps with parsing References:/ In-Reply-To: as well.... sort of
in the simplest form we can take either those and or Message-ID out to a special meta or email meta I suppose. (and I'm sure if i read rfc2822 carefully i'd notice a few dozen other caveats but I digress)
I did notice it sometimes has multiple tokens per line, how would we do basic tokenization by a delimiter (although here they seem to be saying whitespace <blah@bluh> whitespace?. basically want to split out the message ids and tag it to a specific meta....
message-id = "Message-ID:" msg-id CRLF
in-reply-to = "In-Reply-To:" 1*msg-id CRLF
references = "References:" 1*msg-id CRLF
msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]
id-left = dot-atom-text / no-fold-quote / obs-id-left
id-right = dot-atom-text / no-fold-literal / obs-id-right
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
^ let's forget they also list the old format old-message-id or else we'll be here forever...
Nothing is simple about parsing mail
MAIL_lua already extracts In-Reply-To
For other custom additions: the source to MAIL_lua is included with the Parsers book, though the version there is a little out of date with what's in Live:
You could create your own custom parser from it by adding functions for extracting the headers of your choice to the table "mailFunctions". And to be clear: what you are talking about is parsing mail, not SMTP. MAIL_lua parses the contents of email messages regardless of transport (SMTP, POP, IMAP, LMTP, et al). SMTP_lua only parses the SMTP protocol itself.
re mail lua and already parsing in-reply-to
oh i see:
local meta = extractAddresses(header, "email.src")
out of interest why not the other 2 ? (message-id/references?)
are there any plans of including X-xxx mail_lua_options of some sort in the future (similar to the X-Factor but mail only )
also, regarding putting the meta into email.src... mmmmm... I don't know. is that really the right place for it? https://www.ietf.org/rfc/rfc2822.txt I'd be inclined to say separate meta.... ?
The "References:" field will contain the contents of the parent's "References:" field (if any) followed by the contents of the parent's "Message-ID:" field (if any). If the parent message does not contain a "References:" field but does have an "In-Reply-To:" field containing a single message identifier, then the "References:" field will contain the contents of the parent's "In-Reply-To:" field followed by the contents of the parent's "Message-ID:" field (if any). If the parent has none of the "References:", "In-Reply-To:", or "Message-ID:" fields, then the new message will have no "References:" field. Note: Some implementations parse the "References:" field to display the "thread of the discussion". These implementations assume that each new message is a reply to a single parent and hence that they can walk backwards through the "References:" field to find the parent of each message listed there. Therefore, trying to form a "References:" field for a reply that has multiple parents is discouraged and how to do so is not defined in this document.
>mail.lua:VERSION 2016.05.16.1 william motley... 2012.11.28.2 william motley
>Nothing is simple about parsing mail>You could create your own custom parser from it by adding functions for extracting the headers of your choice to the table "mailFunctions".
heh... we've had the coname_mail.lua parser before (either a colleague of mine or support helped with it) , prior to email.src/dst going into mail and options... but basically for large parsers - want to refrain from the temptation (don't want to run both coname_mail.lua and mail.luax in parallel and modifying it 'only do what we want is not necessarily trivial)). The other limiting factor is not having an active test decoder seeing live traffic for this sort of thing .
I know this thread is a bit dated, but I'm in need of the same "message-id" header being parsed into metadata. I have reviewed the Parsers book and specifically the mail.lua file. I note in there that parser is already attempting to tokenize the "message-id", in any of its various possible forms, out of the header.
["MESSAGE-ID:"] = mailParser.mailHeader,
["Message-ID:"] = mailParser.mailHeader,
["Message-id:"] = mailParser.mailHeader,
["message-id:"] = mailParser.mailHeader,
However it seems like, after being teased out of the header, it is not getting put into a metadata of any kind in the mailFunctions. I know this file is a bit dated, and much has probably changed since it 2016 authoring, so I'm curious what the current recommendation is to capture that header token and push to a proper metadata. Can you offer any current insights?
My gut would tell me to add a mailFunctions for ["message-id"] to mail.lua, but I'm hoping there's a way to overload or add it to mail_lua_options, as opposed to modifying mail.lua...
Using the latest versions of MAIL_lua and MAIL_lua_options from Live, there will be an option "customHeaders". With it you can define other headers for which you would like the header values registered as meta.
["message-id"] = "message.id",
Will register the values from "message-id" headers to the key "message.id". Header name is not case sensitive, so you don't need to list "MESSAGE-ID", "Message-ID", et al.
For the key, normal key name restrictions apply. Also note that if you want to query the key, you'll need to index it appropriately if it isn't already.
You'll need to restart the decoder service after modifying that option for the change to take effect.
Side note: HTTP_lua has a similar option.