Specify User and Permissions when Using SaltStack Managed Files

I recently ran into a bug while using SaltStack that was obscure enough that I thought I’d cover it here, in case anyone else runs into a similar issue. As part of a recent client project, we were tasked with updating and fixing an existing SaltStack based infrastructure. The system itself was pretty well setup, it just needed a lot of cleanup and updating.

We were also tasked with rolling out an entirely new production infrastructure based on the existing SaltStack configuration, which was currently only running in staging/development environments. The roll out of the production infrastructure went pretty well, but afterwards there were a few cleanup tasks that needed looking into in order to finalize things.

The Issue

The issue manifested itself as a lack of JMX metrics showing up on the Google StackDriver monitoring dashboard. For those unfamiliar with Google Cloud, StackDriver is their monitoring and alerting tool. JMX metrics are a set of metrics that can be collected from the Java Virtual Machine at run time to monitor things like open files and active threads.

For web servers running Java projects via the JVM these metrics can be a great way to detect resource and application level issues in advance. It’s also possible to send any custom metric via JMX, which gives developers the power to add custom metrics to their application as desired.

The Investigation

I started my investigation by looking at the current configuration which was being placed on the servers by Salt. The version of the StackDriver Agent appeared to be the most recent, and the credentials seemed to be installed correctly to allow communication from the servers to the StackDriver API. This was further borne out by the fact that basic server metrics were showing just fine in the StackDriver interface, just the JMX metrics were missing.

StackDriver uses a standalone file for configuration of JMX, which specifies a local port to listen on in order to collect the metrics. The metrics are then relayed by the StackDriver Agent to the API which collects, stores and displays them.

I checked the StackDriver Agent service status using service stackdriver-agent status. The return indicated that everything was running normally, but in the list of configurations which were loaded I did not see any indication of JMX or Java. I realized that the JMX configuration is only loaded if it’s detected, so I suspected that the current configuration file might be the culprit.

In order to test this theory, I downloaded the JMX template into the same directory as the existing JMX config. diff config1.conf config2.conf revealed that the files were identical, except for the port number for the JMX listener, which was of course different in the default file. In addition, the default file had a linebreak at the end.

I began to suspect that for some reason an ending linebreak was required in the configuration file in order for the StackDriver agent to recognize it. Strange, but not unheard of in the realm of configuration quirks. I added an ending linebreak to the configuration file, restarted the service and ran service stackdriver-agent status again.

This time the list of loaded configurations included Java! I checked the Stackdriver Dashboard to make sure the metrics were in fact in place, and proceeded to modify the SaltStack template to add an ending linebreak to the configuration file. I then applied the updated configuration to another server via salt and waited for the metrics to show up.

The Resolution

The metrics never did show up, and checking the service status indicated that Java was not amongst the loaded configurations on the StackDriver Agent. I proceeded to download the default configuration on the second server and compared them to ensure that they were in fact identical.

A diff indicated no differences in either file. I then restarted the StackDriver Agent and checked the status one more time. This time the Java configuration was listed as loaded! I realized that the default configuration was being loaded, while the configuration managed by Salt was being ignored.

At this point I knew it wasn’t an actual configuration issue, since both files were identical. I immediately began to suspect permissions. I ran stat conf1.conf and then stat conf2.conf and immediately noticed that the Salt managed file was owned by the user which ran our web services while the downloaded default config was owned by root.

At this point it occurred to me that the StackDriver Agent runs as root and that it may not look for files owned by other users. Running chown root:root conf1.conf solved the issue and after restarting the StackDriver Agent it loaded the Java configuration correctly from the managed file.

I added the following entries under the file.managed header for the StackDriver JMX configration in the appropriate .sls and applied it to another server. Sure enough, JMX metrics were returned almost immediately.

- user: root
- group: root
- mode: 644

Lessons Learned

When managing files via Salt, never assume that default ownership or permissions are sufficient. Make sure you know exactly who should own/access the files you manage and set permissions accordingly.