AnsweredAssumed Answered

Cannot enable appliance after 10.4 upgrade

Question asked by Tomi Reiman on Nov 7, 2014
Latest reply on Nov 10, 2016 by Arthur Costigan

Hello,

 

We first upgraded one log hybrid from 10.3.4 to 10.4 without any issues really. That log hybrid is located on the same subnet as the SA server. We then proceeded to upgrade a second log hybrid, one which is not in the same subnet as the SA server. The installation hung for a reason we now believe is that the upgrade instruction document lacks the mention of ports used by Puppet, RabbitMQ, MCollective, and so on. So during that install we clicked the Upgrade button from the GUI, but never got the Reboot or Enable button. We eventually got the puppet agent working - at least it seems that way - and even removed and repurposed the appliance from the SA server. Earlier today I had to manually sign the request made by the puppet agent at the puppet master and after that I got a message requesting a manual reboot from the appliance itself. The message never vanished after how many reboots. We then removed all the SSL stuff from the log hybrid, cleaned the certificates from the puppet master and initiated puppet agent --test --waitforcert 30. That time we even got the puppet master to sign the request, thus giving us the pop up window in the SA GUI containing the fingerprint. However, we still haven't got rid of the Enable button. I suspect if I gave it a few more tries I might get to to display Reboot Required for me, but that would never go away either.

 

I think syslog collection is working ok, but the log collector service is not. Here are some sample errors gathered from all around the system:


log hybrid - /var/log/messages:

[AMQPClientBase] [failure] An error occurred creating an AMQP channel: : connection closed unexpectedly

[BufferedChannel] [failure] An error occurred publishing to an AMQP channel: : connection closed unexpectedly

[EventBroker] [failure] failure in updating statistics for: No such node (stats)

 

log hybrid - /var/log/rabbitmq/sa@localhost.log:

=WARNING REPORT==== 7-Nov-2014::17:30:03 ===

HTTP access denied: user 'logcollector' - invalid credentials

 

 

=ERROR REPORT==== 7-Nov-2014::17:30:03 ===

webmachine error: path="/api/nodes"

"Unauthorized"

 

 

=WARNING REPORT==== 7-Nov-2014::17:30:03 ===

HTTP access denied: user 'logcollector' - invalid credentials

 

 

=ERROR REPORT==== 7-Nov-2014::17:30:03 ===

webmachine error: path="/api/connections"

"Unauthorized"

 

 

=WARNING REPORT==== 7-Nov-2014::17:30:03 ===

HTTP access denied: user 'logcollector' - invalid credentials

 

 

=ERROR REPORT==== 7-Nov-2014::17:30:04 ===

closing AMQP connection <0.983.2> (127.0.0.1:48949 -> 127.0.0.1:5671):

{handshake_error,starting,0,

                 {amqp_error,access_refused,

                             "PLAIN login refused: user 'logcollector' - invalid credentials",

                             'connection.start_ok'}}

 

 

=ERROR REPORT==== 7-Nov-2014::17:30:05 ===

closing AMQP connection <0.987.2> (127.0.0.1:48950 -> 127.0.0.1:5671):

{handshake_error,starting,0,

                 {amqp_error,access_refused,

                             "PLAIN login refused: user 'logcollector' - invalid credentials",

                             'connection.start_ok'}}

 

 

=ERROR REPORT==== 7-Nov-2014::17:30:05 ===

closing AMQP connection <0.991.2> (127.0.0.1:48951 -> 127.0.0.1:5671):

{handshake_error,starting,0,

                 {amqp_error,access_refused,

                             "PLAIN login refused: user 'logcollector' - invalid credentials",

                             'connection.start_ok'}}

 

In /var/log/puppet/masterhttp.log on the SA server the log looks completely OK around every puppet test attempt - no error status codes are reported. For example:

 

[07/Nov/2014:22:37:46 UTC] "POST /production/catalog/530e71a6-d288-4b25-a2fe-455a10398a91 HTTP/1.1" 200 49979

 

Is it somehow possible to start from square one - meaning that I would get to start from the same point I did before everything went wrong - so a point where I would see the "Upgrade to 10.4" button in the SA GUI instead of the red or yellow Reboot Required or Enable buttons? Or is there a simpler way to fix things? Honestly I do not know what else might be corrupted or broken due to the interrupted installation on the first try because the ports for puppet communication etc. were not open because we did not have any knowledge of such requirements. Re-imaging the device is not an option. What does the Enable button in the SA GUI actually initiate? If I knew that it might be easier for me to track and debug the issue.

 

Finally, on the log hybrid the /etc/hosts includes:

 

x.x.x.x puppetmaster.local

 

where x.x.x.x is the IP address of the SA server.

 

Anyone know how to further approach the issue or how to force a 10.4 re-upgrade as it happened the first time. I have no problem re-installing every RPM as long as it does not lose data or configuration regarding event sources. I have a support ticket open regarding this case but I am seeking further help in the hope I would find some already during the weekend.

Outcomes