Skip navigation
All Places > RSA Labs > Blog > 2017 > November

By design containers are meant to be disposable. They are meant to be shipped around to different environments and brought up and down at will. For instance, a container orchestration technology like Kubernetes can automatically bring up new containers in response to a spike in demand, and then tear down the same containers when the demand subsides. Or, as part of the continuous delivery life cycle, the same container image running on a developer's laptop can be spun up in a test environment for verification and then deployed in production by an operations team.


AI-based security solutions like Project Syn use training data to learn what's normal and flag abnormal activity. But the impermanence of containers poses an interesting problem: how can a machine learning system gain the right amount of insight about individual containers in order to raise meaningful alerts? An overly sensitive system that raises alerts before it has enough data to extrapolate from will generate noise and false positives. But an overly conservative system that waits too long for enough data to become available will fail to raise important alerts and result in false negatives.


Container Profiles


Project Syn addresses this problem with the concept of container profiles. In a nutshell, container profiles allow for behavior learned about one container to be shared with similar containers run later in time. This means that for many containers, the training stage can be bypassed altogether, and alerts can be generated immediately after deployment.


Let's take the example of continuous delivery, shown in the figure below. Suppose a new container image is in the process of being deployed to production. First a container from this new image is deployed in a staging environment. After a set training period, Project Syn creates a profile for this container, which captures the behavior learned about this container.



At a later point in time, after the container image has passed the requisite checks in staging, a new container (or set of containers) is deployed in production from the same image. Since Project Syn already has a profile from a previous container coming from the same image, it applies that profile to the new container. The new container bypasses training, and if it happens to be compromised shortly after deployment, Project Syn can immediately raise alerts to that effect.


Profile Matching


How does Project Syn determine when a profile can be applied to a container? It's not based simply on the container's image. Containers run from the same image can exhibit very different behavior based on how they are run. Project Syn uses containers' runtime metadata, such as command line arguments and ports, as part of profile matching.


For example, let's compare three nginx web server containers that are run from the same nginx image. Container A runs only with a private port and is only accessible on the same local virtual network as the container. Container B exposes its private port on port 80 and is accessible from outside the container host (assuming the host firewall is open). Container C exposes its private port on port 8080 and is also accessible outside the container host.



In this case, there are two unique profiles, one profile for container A, and one profile for both container B and C. The difference in public port for container B and C (80 vs 8080) doesn't represent a meaningful difference in behavior.


Profile Matching Using Labels


If you want to, you can explicitly control how profile matching works using Docker object labels. Labels are custom metadata in the form of key-value pairs that can be attached to containers.


Here's how it works: first, you tell Project Syn which label keys you want Project Syn to use for profile matching. When you run a container, you run it with those same labels, and set the label values appropriately. Containers with the same label key-value pairs are matched to the same profile.


What's Next


Container profiles today only work within the context of a single customer. It's not hard to see a future in which customers can opt-in to share profiles with and use profiles from RSA and other customers. This would enable the community to collectively improve container security for everyone. 


Stay tuned for more updates!

It’s an essential question for security teams following a cyber attack: Where did the threat originate? In the days and weeks following the WannaCry ransomware attack—which swept through 150 countries, infecting hundreds of thousands of computers—reports emerged pointing to various potential actors. But none of the insights came soon enough to help defend against the attack. Unfortunately, the type of analysis used to derive them just doesn’t work that fast. The good news is there are other approaches that do.


Dynamic analysis of WannaCry and its possible origins required hours of manual code inspection. As a result, the first clues took several days to emerge, and further insights took weeks. The problem is the process entails manually comparing thousands of code segments from dozens of known malicious actors. As the volume of new malware threats grows (the AV-TEST Institute reports registering over 390,000 new malicious programs daily), that problem is only going to get worse. Dynamic analysis simply can’t scale to compare code quickly enough to identify the origins of a new piece of malware in a timely way.


Dynamic analysis can help determine the runtime effects of a piece of malware, but with tools for sandbox detection and evasion becoming increasingly common, its value is limited. Besides, knowing what a piece of malware does won’t help with file similarity analysis, as there may be dozens of ways to achieve that result. Comparing file hashes has never really been useful, either, since attackers routinely leverage code polymorphism to ensure each piece of malware has a unique hash. What about fuzzy hashing as a tool for file similarity analysis? It’s increasingly being used to measure how similar two binaries are. The challenge is fuzzy hashing tools like ssdeep are applied to the entire file and can’t catch similarities more complex than one file being related to another.


But what if fuzzy hashing could be applied to pick up code similarity at a more granular level? That thinking has led RSA to a new static analysis technique for detecting complex similarities and, moreover, identifying similarities from multiple pieces of malware. Through this approach, we can create a malware genome, if you will, that provides an understanding of how malware evolved, even when it’s an amalgamation of multiple malicious tools. Beyond mapping out code capabilities, this genealogy may shine some light on the malicious infrastructure and exchange of tools happening on the attacker side.


As a service to others engaged in threat investigation, we’re freely sharing the tool we’ve been using to explore this approach. Our hope is WhatsThisFile will help defenders evaluate unknown files faster, discover similarities to known malware and quickly gain the insights needed to better defend their enterprises.