Ingo Schubert

Introduction into Identity Federation Part 3

Blog Post created by Ingo Schubert Employee on Apr 8, 2016

Introduction

Name Mapping is easily the most complicated and least understood part of SAML. That makes it a perfect subject for the first deeper look into SAML. If you get NameID mapping the rest of SAML will be a walk in the park.

What is a NameID?

A NameID is the subject of a SAML assertion.

Let's look at part the SAML assertion from the last article:

 

<saml:Subject>

      <saml:NameID  Format="urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified">

John Doe

</saml:NameID>

</saml:Subject>

 

... and remove the clutter:

 

<Subject>

     <NameID>

          John Doe

     </NameID>

</Subject>

 

So the NameID is the guy/girl the SAML assertion is all about.

Why map NameIDs?

With a "NameID" SAML means the  identifier of a SAML subject.

That is in most cases the user ID on the IDP or the SP... and this is where the trouble starts.

 

It is unlikely that a user has the same user IDs at both the IDP and the SP.

 

Example: While the old RSA employee userID schema used <first letter first name><lastname> e.g. ischubert, EMC uses <first five letters last name><first letter first name> e.g. schubi.

EMC also has that badge number which doesn't relate to the real name at all.

So just imagine the mix of different user IDs in a bigger environment.

Additionally there are scenarios where e.g. a group/role from one side is mapped to a single user on the other side. E.g. instead of sending the individual user ID from Company A employees over, the subject would be "Company A employee".

What are NameID Formats?

NameID Formats are rules how to submit user identifier information from the IDP to the SP and vice versa.

A NameID format has a Identifier (a name...) and mandates certain formats for the Subject ID of the SAML assertion.

That doesn't make it terrible clear, doesn't it?

 

Let's get more specific:

SAML 2.0 knows 8 formats (and you can make up your own too....):

  • Unspecified
  • EMail
  • X.509
  • Windows Domain
  • Kerberos
  • Entity Identifier
  • Persistent
  • Transient

 

The most popular NameID format is...

EMail!

The EMail NameID Format Identifier is: "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress"
As you can see from the identifier this format exists since SAML 1.1.

A Subject definition with this format could look like this:

<saml:Subject>

      <saml:NameID  Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">

          John.Doe@company.com

     </saml:NameID>

</saml:Subject>

 

In the SAML standard docs (SAML 2.0 Core) it says for this format:

It indicates that the content of the element is in the form of an email address, specifically "addr-spec" as
defined in IETF RFC 2822 [RFC 2822] Section 3.4.1. An addr-spec has the form local-part@domain.

Note that an addr-spec has no phrase (such as a common name) before it, has no comment (text surrounded
in parentheses) after it, and is not surrounded by "<" and ">".

 

Or way shorter: the NameID must be a valid eMail address (valid in the sense that it follows RFC 2822 - not that the eMail address actually exists).

So if a IDP and SP agree on using this NameID format they agree to use the user's eMail address every time they send a SAML assertion or request about this user to one another.

 

The Email NameID format is probably the most widely used format. Almost every cloud service uses the user Email address as the primary identifier. Makes sense: it's a nice flat namespace and a familiar format for everybody involved.

 

The other NameID Formats explained

Persistent

As the name suggests this means the NameID matching a user exchanged between IDP and SP stays the same over a very long period of time or even forever.

You could say that EMail is also persistent - true but with the NameID format urn:oasis:names:tc:SAML:2.0:nameid-format:persistent has some unique characteristics to it. It is not only about using a unique permanent identifier - It is also about how this identifier looks like and how it is handled accross multiple IDPs and SPs.

How does it look like?

The SAML 2.0 Core doc has a good explanation:

<The persistent NameID Format...> Indicates that the content of the element is a persistent opaque identifier for a principal that is specific to an identity provider and a service provider or affiliation of service providers. Persistent name identifiers generated by identity providers MUST be constructed using pseudo-random values that have no discernible correspondence with the subject's actual identifier (for example, username).

The intent is to create a non-public, pair-wise pseudonym to prevent the discovery of the subject's identity or activities. Persistent name identifier values MUST NOT exceed a length of 256 characters.

 

Example: FE9ADE3FFA97C8DF225BBBC05D3521AB6850005312B03

 

Important to know:Each IDP-SP releationship has a unique persistent identifier for a  given user (e.g. schubi).

The ID for the IDP-SP1 releationship could be FE9ADE3FFA97C8DF225BBBC05D3521AB6850005312B03  but for IDP-SP2 it is 1170EECBC620B97BC599A1608CEBFFF1271813.

There is one exception to this rule. IDPs and SPs that are part of a affiliation agree to always use the same opaque, persistent ID for a specific user.

An example for a affiliations could be a large corporation that uses federation internally between all divisions. Personally I have never seen this feature (affiliations) of SAML 2.0 in real life.

How is a persistent ID generated and mapped to a real user ID?

This  opaque identifier is of course mapped to a real user ID on each side. On the IDP side if a user is federated out the first time to a specific SP the IDP generates a new opaque identifier and sends it to the SP in the SAML assertion. The SP obviously has no idea who the persistent ID is for - it never encountered this ID from the specific IDP before.

To find out which local user this persistent ID is for it asks the user to identify & authenticate.

If the user can do that the SP matches the (now known) local ID with this new persistent identifier.

The next time the persistent ID comes in the SP knows which local user this is all about. So this little dance (login at IDP, go to new SP, authenticate at SP) is needed every time a user goes from one IDP to a SP he hasn't SSO'd into via this IDP. There is the more exotic case where the IDP and SP can do this matching in the background via a batch job. Nothing is really specified here in the SAML standard. With good reason: Seldomly will the SP and IDP have user repositories with all needed profile info to link two local IDs to a persistent ID - if they could they may as well use another NameID format or make up one one their own to map the user IDs.

Preserving privacy with persistent NameIDs

So what about "non-public pair-wise pseudonym to prevent the discovery of the subjects identity or activities"?

This is the cool part: That prevents SP1 and SP2 working together to trace what a user does across SPs - they can't simply ask the other "what is user X doing on your side?" because X is different for both - even if it is about the same user. They simply can't tell. There is no way for them to look into the other's NameID for a specific user - this is why they are called "opaque" identifiers after all.

Managing a persistent ID

SAML 2.0 also specifies that users can manager their own persistent IDs. They can request a new one or delete a specific one. That comes in handy if you like to cut the link between a IDP and SP because you do not longer user the SP and like to tell it to forget everything about you. I've never seen this used in real life but I can imagine this could come in handy.

Advantages and Disadvantages

There are obviously a bunch of strong advantages:

  • user can have different IDs on both the IDP and SP
  • IDP and SP don't have to communicate directly to link the local IDs to the opaque identifier. The user does this by loggin into the SP and proofing he "owns" the opaque ID send by the IDP.
    • Also possible in bulk in the background but hardly practical
  • preserves the user privacy by making it really hard for th SPs to work together to find out what the user does at the other SPs.

But of course there are some disadvantages:

  • The user has to logon at the SP to proof he owns the persistent ID.
    • This requires that the user has an account at the SP including some form of credential to authenticate.
    • The user has to perform a login at the SP the first time - sometimes that is seen as not very user friendly.

Why all this effort?

One of the big changes of SAML 2.0 over 1.x was the introduction of NameID formats. SAML 1.x could only work with flat namespaces: IDP and SP needed to use the same ID for a particular user.

That was seen as a disadvantage if federation was to be used between e.g. different companies. The chance that the same user had the same ID on both sides was seen as minimal.

That is still true to some extend today but it is not such a big problem it turns out.

In enterprise use cases the employees are the ones that are federated to cloud services or to partner companies. The user eMail address does just fine for that.

Consumer type use cases nowadays use OAuth or OpenID Connect and usually create the user profile on the fly on the first federation. I'll write about OAuth/OpenID Connect later in this series - let's finish the SAML part first before opening up another box full of toys...

Transient

The IDs for urn:oasis:names:tc:SAML:2.0:nameid-format:transient look the same as the one for the persistent NameID format however they behave differently. Firstly the IDP generates a new opaque identifier every time the user is federated out. So the SP shouldn't even try to map this to a known local user. That would be hopeless - the next time the same user is federated over the ID will be different. Mapping to a known local SP user is not the intend behind the transient NameID Format. It is all about federating out temporary users so that the SP can tell that they come from trusted IDP and that maybe that user has certain attributes (if included in the assertion).The SP is supposed to handle this temporary user just as such: create a temporary account and forget (delete) it after use (or a short time after). This is useful for SPs that do not rely on a real user as the basis for their offering but only need to know that a trusted business partner (the IDP) trust those users enough to let them use the service at the SP. An example would be a service that provides real-time stock charts. It really doesn't care who the user is as long as it comes from a partner (e.g. a bank) that has a relationship with the SP (and presumably pays a flat fee for all it's customers to use the service).

Unspecified

"Unspecifie"d doesn't mean it is unimportant. Quite the opposite. If all the other NameID formats don't fit your requirements because

  • Persistent requires the user to log into the SP manuall the first time
  • Transient isn't peristent (doh!)
  • eMail doesn't fit 'cause you don't know the eMail of the user
  • Kerberos is not you thing
  • You don't use certificates (X.509)

 

"Unspecified" NameID format doesn't mean you make up your own NameID Format Identifier. There is actually a NameID format Identifier for Unspecified: urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified

If you use this in your SAML assertions you assume that the SP (and potentially other members of the circle of trust) know what the ID means because you agreed on what this means beforehand.

"Unspecified" today is also used to transport just Email addresses as NameIDs - it's basically used as a dummy NameID format.

 

An example:

A country has a unique ID for each citizen (1984 for me please!). The IDPs like to tell the SPs this unique ID.

 

At the first look it is tempting to use the persistent NameID Format. Which so wrong! That has been done before (I know of at least two EU govenment projects which did this in the past) but doesn't work and is actually against the SAML standard because... "Persistent name identifiers  generated by identity providers MUST be constructed using pseudo-random  values that have no discernible correspondence with the subject's actual  identifier (for example, username)." Remember?

 

So instead they should use urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified - and tell all the SPs and IDPs what that means for them (= use the citizen ID as the NameID).

 

In one EU member state uses federation with NameID format "unspecified" to transfer a "doctor ID". Perfect. Good choice! They did it right!

 

Of course you could also completely make up your own NameID Format Identifier.

For example urn:rsa:names:federation:nameid-format:really-cool-format

If this is really needed or if you get away with just using "unspecified" really is up to the IDP and/or SP you are dealing with.

Whatever you do however: don't misuse persistent to do things it is not meant to do. That will cause huge issues.

How does RSA Via Access handle NameID Formats?

RSA Via Access can handle all the SAML 2.0 NameID formats as an IDP.

This is configurable in the "Advanced Configuration" section of the "Connection Profile" screen for the application (a.k.a. SAML SP) you are configuring.

Changing that will set the selected NameID format accordingly in the SAML assertion. There is one caveat to this: persistent and transient NameIDs will be not be generated but whatever is set as the subject of the assertion.

As a SP, RSA Via Access assumes the NameID sent over maps to the LDAP attribute set as the user tag (a.k.a. logon name)- So RSA Via Access (as of writing the post) ignores the NameID format of incoming SAML assertions.

Outcomes