sharing your backup v1 script vs v2 experience and some thoughts on the matter
we've been prepping to upgrade from 10.6.2 to 10.6.4 and reviewing the backup scripts and docs provided by RSA.
and also had an opportunity to test them out on migrating a decoder from SD to HDD [we used v2 despite the lack of official certification for that. Core ins't very complicated]
I'd like to share our experience to raise awareness of the improvements and fixes in the newer version of the scripts to put pressure on RSA product team to certify the v2 script for 10.6.2.
first off - the scripts are quite nice:
- you can centrally get an appliance list from head unit and propagate SSH keys then remotely back up your appliances to your head unit. [generally streaming via Tar over ssh]
- It also excludes the common locations with 'long backup times' [malware repo for files, mnesia db stats bits on log collector, run reports on RE] . That you can re-include with optional args
- In v2 you can also move the backups off to an external mountpoint automagically and some other things .
I suppose it looks like RSA is prepping for the v11 upgrade - hence the -U switch [we'd have to buildstick everything for 11 I hear]
- v2 provides better instructions on migrating custom users/groups ...etc
- v2 has disk space check options.
1) Some of the things we really did not enjoy with the scripts:
- the script looks to have been substantially rewriten but there is no change log or known issues list for v1 vs v2.
Support seem to be blindly recommending v1 without actually testing it or having experience runnining it:
i) v1 script writes esa backup to a small parition using mongodump - potentially non restorable esa - fixed in v2maybe we'd pick up mongodump failing and the parition running over via health policy... if it works.
ii) v1 script - does not backup postgres DB on malware server (that doesn't sound restorable
iii) no guidance on validating output - although there is a log and v2 also does checksums
iv) both scipts don't /etc/netwitness/ng/Geo* feed dat files. [domain/org/country/etc meta won’t get tagged – bad] - no mention in pdfs . A useful reference is this 000035021 - How-to Update the geoIP Databases on RSA NetWitness decoders [well you can get it out RPMs, but it's curious to see it's not a feed. Feed Redist negotiation with Maxmind failed i suppose? ]
2)build stick doc quality was suspect .
- Some KBs we were referred to were very vague. The most useful one was 000029977 - Instructions for build sticking an RSA Security Analytics appliance using the "SA 10.4.0.2B" image .
- HW compatibility issues were not well documented [plus the variety of r610-20-30 hardware is not well documented in terms of bios options either. Especially r610 - e.g. hiding PERC8xx PCI vs unplugging DAC].
- Cold run testing was very unclear . [test your build stick...before wiping raid config . we've had an issue where the boot menu comes up ok, but then it can't find the KS scripts and sees not usb so changing SDA/SDB/SDC doesn't work - different brand worked ok ] - we have provided some suggestions on improving documentation and removing some older less helpful docs
- Support teams in some regions have never touch hw appliances and RSA keep trying to push back to professional services for backup restore rebuild [mmm come on...
- realistically, I take it - the v1 script has been well tested for core appliances and less can go wrong with none-core but I personally would be a lot less comfortable running v1 on real servers vs v2.
So yes, so if you're on <= 10.6.2 - stand up for your rights - put some pressure on your account manager so they have a chat to the product team. oh and test your buildsticks:
- get RSA to back certify v2 scripts . Don't accept extra operational risk from RSA and run v1. backup and restore should be easy.
- get RSA them to publish a diffs list for scripts. (and fix list)
- push RSA to document and fix build stick processes and improve documentation - this shouldn't be a painful difficult process .
- get some test servers to play with. Virtual?
thanks for listening.
- Community Thread
- Forum Thread
- RSA NetWitness
- RSA NetWitness Platform
I'll work with the documentation to revise the buildstick procedure and to incorporate a change log for the backup & restore script. Since we released v2 with 10.6.4, it's the recommended and supported script. Customers shouldn't have a reason to use v1 now that v2 is released.
We really appreciate the feedback! We'll continue to update the script with future release cycles.
Product Manager - Platform
> Since we released v2 with 10.6.4, it's the recommended and supported script. Customers shouldn't have a reason to use v1 now that v2 is released.
our biggest issue the v2 script can not be officially used with SA10.6.2 , according to support.
the KB says v2 is for SA >= 10.6.3. The PDF says v2 is for >=10.6.2. The script itself says v2 is for 10.6.3 ,10.6.4. but untested and unsupported for <=10.6.2
^ which is also less than clear but I'd say the bits in the script are the authoritative ones?
v2 says only 10.6.3 and 10.6.4 are qualifying tested versions and 10.6.2 is not officially supported
(although I suspect it's quite usable. e.g. especially for core appliances)
We've been trying hard to get it certified for use with 10.6.2 but had trouble doing so via Support/AM.
are there any plans to backport v2 script official support for customers <=10.6.2?
or is what you're saying v2 is officially supported on SA 10.6.2 as well? If I'm reading your comment correctly - v1 script/links should be deactivated.
Like I said the script improvements in V2 seem very significant (including bug fixes that would result in otherwise non restorable appliances) and the lack of change log/backported support for 10.6.2 is concerning
ps there's at least one more bug with the v2 script. Reported it to support.
when doing the thorough disk size calc (-D ) - it doesn't consider the backup exceptions options (default ones - e.g. malware repo, RE run , log collector mnesia )
will prevent the script from running correctly, in a timely manner /without consuming extra compute and reporting the correct required disk space and may result in unexpectedly missing backups (script hard fails backup after incorrectly estimating required space and concluding it shouldn’t proceed)