Better netflow visualization with code_swarm coolness!

Howdy all,

In my last post, I may have mentioned codeswarm, a nifty tool for visualizing how frequently a software project gets updated throughout time. Since it’s an open-source project, I figured that it was worth having a look at the code and seeing if there are other uses for it.

If you check out the Google Code page, you’ll notice that the project isn’t terribly active – the last upload dates back to May 2009. But hey, it does what it’s supposed to do and it’s pretty straightforward.

Reading through the source files, in fact, use of the tool is super simple: you set up an XML file that contains the data to be used, you run Ant, and you let the program do the rest. The format of the sample data is very simple, frankly: a file name, a date, and an author.

So let’s see what other uses we could come up with. Here are a few ideas I thought might be cool:

  • What about adapting it to track your social media messages? First, if you’re following a lot of people, it would look wicked cool. Second, if you’re trying to prune your Follow list, that could be really practical for figuring out who’s the noisiest out there.
  • Sometimes when you’re trying to figure out bottlenecks in your traffic, it’s useful to have a decent visualization tool. Maybe this could be helpful!
  • Finally, you sometimes need a good way to track employee activities. Would this not be a kickass way to see who’s active on your network?

I decided to work on the second idea. I’m not looking to rework the code at this point, just to reuse it with a different purpose.

Prerequisites

To pull this off, you’re going to need the following:

  • The codeswarm source code and Java, so that you can run the code on your system
  • Some netflow log files to test out
  • flow-tools, so that you can process said netflow log files
  • A scripting language so that you can process and parse the netflow traffic into XML. My language of choice was ruby, but it could be as simple as bash.

 The netflow filter

Before we can parse the netflow statistics into the appropriate format, we need to know what we’ll be using and how to extract it. Here’s what I used: each IP endpoint should have its own line; the IP address maps to the “author” field (because that’s what is visible). The protocol and port will map to the filename field, and the octets in the flow will map to the weight field.

The following is the netflow report config file. You should save this in the codeswarm directory as netflow_report.config:

stat-report t1
 type ip-source/destination-address/ip-protocol/ip-tos/ip-source/destination-p$
 scale 100
 output
  format ascii
  fields +first
stat-definition swarm
 report t1
 time-series 600

If you save some netflow data in data/input, you can test out your report by running this line:

flow-merge data/input/* | flow-report -s netflow_report.config -S swarm

Parsing the netflow

If the report worked out correctly for you, the next logical step is to write the code to create the .XML file that will be parsed by codeswarm. You’ll want to set your input directory (which we’d said would be data/input) and your output file (for instance, data/output.xml).

Here’s the source code for my grabData.rb file:

#!/usr/local/bin/ruby
# Prepare netflow data for codeswarm.
$outputFilePath = "data/output.xml"
$outputFile = File.new($outputFilePath, "w")
$outputFile << "<?xml version=\"1.0\"?>\n"
$outputFile << "<file_events>\n"
# Grab the netflow information using flow-tools
$inputDirectory = "data/input"
$input = `flow-merge data/input/* | flow-report -s netflow_report.config -S swarm`
# This is the part that gets a bit dicey. I believe that in order to properly visualize
# the traffic, we should add an entry for each party of the flow. That's exactly what we're
# Going to do. The "author" in this case is going to be the IP address. The "filename" will
# be the protocol and port. The weight will be the octets.
$input_array = $input.split("\n")
$input_array.grep(/recn/).each do |deleteme|
 $input_array.delete(deleteme)
end
$input_array.each do |line|
 fields = line.split(",")
 last = fields[0]
 source = fields[1]
 dest = fields[2]
 srcport = fields[3]
 dstport = fields[4]
 proto = fields[5]
 octets = fields[8].to_i / 1000
$outputFile << " <event filename=\"#{proto}_#{srcport}\" date=\"#{last}\" author=\"#{source}\" weight=\"#{octets}\"/>\n"
 $outputFile << " <event filename=\"#{proto}_#{dstport}\" date=\"#{last}\" author=\"#{dest}\" weight=\"#{octets}\"/>\n"
end
$outputFile << "</file_events>"
$outputFile.flush
$outputFile.close

And we’re done! This should generate a file called data/output.xml, which you can then use in your code swarm. You can either edit your data/sample.config file or copy it to a new file, then run ./run.sh.

Reality Check

I was really excited when running my first doctored code swarm; unfortunately, though the code did work as expected, the performance was terrible. This was because the sample file that I used was rather large (over 10K entries). Probably considerably more than what the authors had expected for code repository checkins. Also, I suspect that my somewhat flimsy graphic card is unable to handle realtime rendering of the animation, so I set up the config file to save each frame to a PNG so I could reconstitute the animation later. Syntax for this is:

ffmpeg -r 10 -b 1800 -i %03d.jpg test1800.mp4

Moreover, I believe my scale was off; I changed the number of milliseconds per frame to 1000 (1 frame, 1 second).

The second rendering was much more interesting, but it did yield a heck of a lot of noise; let’s not forget that we’re working with hundreds, if not thousands, of IP addresses. However, if we do a little filtering we can probably make the animation significantly more readable.

All in all, this was a rather fun experience but a bit of a letdown. Codeswarm wasn’t meant to handle this high a volume of data, so it makes things tricky, and less readable than what I expected; if you play with your filters, you will definitely be able to see some interesting things but if you’re looking for a means to visually suss out what’s happening on your entire network, you are bound to be disappointed. By next time, I hope to talk a bit about more appropriate real-time visualization tools for netflow and pcap files, maybe even cut some code of my own.

Times (and blogs), they are a-changing…

Hello, one and all!

Welcome to the new FAIL tales! I hope you like our new digs. Simpler yet more sophisticated, methinks.  I’ve really grooved on Blogger these past years; it’s easy to post, easy to customize, and, since it’s maintained by someone else, it’s a cinch to administer. But it’s once again time for change, time to move on to bigger and better things.

I was once told by an exceptionally insightful professor that, as computer scientists, we were going to be learning throughout our entire careers. “Computer Science is all about change,” he would say, “get used to the idea that you’ll constantly have to re-learn everything you think you know.”

At least, he said something like that (memories fade and this was over a decade ago). The scary thing is, I think he’s right. I’ve been into computers since I was nine years old, and I haven’t seen the industry slow down by a single Mhz. Technology is getting cheaper and more accessible by the minute, and innovation is happening faster and faster; slow down in a field like this one, and you might as well retire.

Welcome to my new blog – stay a while. I hope you like it here :)

Ubuntu as your hypervisor

Ubuntu is a free server operating system that is easy to maintain and build on.  I’m a big fan; and recently, I’ve been using it to run our development environments at the office.  If, like us, you’re looking to build a low-cost environment for non-production work, here’s an article that may be a useful start.

Note that many production environments run on KVM – the setup I describe below would need some tweaking, especially from the hardware side, before it would be ready for that…  And while I do talk about production versus staging considerations throughout the article, there are some fundamental aspects that I do not talk about and which should be in the very least touched upon for production — such as setting up your environment with redundant compute nodes and redundant gigabit switches that are separate from your LAN switches, enabling of jumbo frames, disabling of multicasting, use of an iSCSI SAN with snapshotting and replication capability, not to mention your hardware’s scalability — please do therefore be mindful of the fact that, while this article is sufficient for development and testing, you should consider it as incomplete for a production environment.  Ye be warned.

 

Getting Ubuntu up and running
  • To get started, you’ll need Ubuntu Server. If you’re planning to use the server for production, download the latest LTS; otherwise, you can just get the latest version.
  • You’ll need the hardware to run the hypervisor, of course. Make sure that the machine that you use has ample CPU’s (64-bit processors with hypervising instructions), memory, and storage space (RAID-1 15K SAS should be sufficient for a production environment as long are you’ve got a SAN for storage; if your physical host is also supposed to be storing the VM’s, I assume this is a test or dev environment — I’d recommend at least two SATA drives in RAID-0). If you have spare hardware, I would definitely recommend setting up one machine as an iSCSI or NFS SAN instead.
  • Run the Ubuntu Server installation on your compute node. I won’t walk you through the installation, as this is not a KB article on setting up Ubuntu. However, do make sure you at least consider the following:
    • You may wish to set up your storage partitions as LVM so that you can add disks later (that is, if you’re using your compute node as a storage device as well)
    • When prompted for the services to install, you should at least set up openSSH and VM Server services.
  • Once your system is installed, you may wish to set up your public-key authentication. You can find information on how to do this in putty here: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/pka-putty.html
  • Make sure that you have all the necessary libraries: apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils openssh-server virt-manager convirt
  • Set up your putty server profile (skip for Linux users):
    • Specify the host address
    • Specify the auto-login username in Connections > Data
    • Enable X11 forwarding in Connections > SSH > X11
    • Specify the private key file to be used in Connections > Auth (if applicable)
    • Make sure you have Xming installed. It needs to be running when you run putty.
  • If you’re using linux, when you connect via SSH be sure to specify the X11 parameters and public key parameters like so:ssh <host> -X -i <private key file>
    Private keys can be generated using ssh-keygen as described here:
    https://help.ubuntu.com/community/SSH/OpenSSH/Keys
  • At this point, your host should be ready to be used. You can create the VM in two ways:
    • run virt-manager and create the machine using the GUI
    • run ubuntu-vm-build with syntax like this:sudo ubuntu-vm-builder kvm hardy –addpkg vim –mem 256 –libvirt qemu:///system
Tools
The tools below are for Windows clients only — they are not needed for linux as the functionality is built-in:
Putty – Windows SSH client with some nifty advantages: you can create server profiles, set them to use public key authentication, enable X11 forwarding and TCP tunneling, all from a GUI.
X-ming — Windows X server that allows you to run Linux graphic applications remotely over SSH. Used with putty, you can run apps such as ghex or gedit from your Windows machine.
The tools  below are for managing the hypervisor. They are Linux applications, which is why you need the above tools if you’re running Windows
virt-manager — GUI interface for creating, starting, stoping or moving VM’s.
virsh — Command-line equivalent of virt-manager. Practical when you just want to start or shutdown a VM.
Useful commands:
virsh list –all → lists all machines running on the host
virsh start <machine name> → starts the machine
virsh shutdown <machine name> → attempts to gracefully shut down a VM
virsh suspend <machine name> → pauses the VM
virsh destroy <machine name> → forces the VM off
virsh can be used to migrate machines live from one host to another. Use this syntax:
virsh migrate –live <name of the machine> qemu+ssh://<destination physical host name>/system
convirt — similar to virt-manager, this GUI tool purportedly allows you to drag & drop VM’s from one server to another. Still under evaluation.
URLs:
Next Steps
Here are a few next steps that you may wish to consider for enhancing your hypervised environment:

Set up NFS4 shares, so that you can share VM’s and migrate them from one compute node to another:  https://help.ubuntu.com/community/NFSv4Howto
Set up a bridge so that your VM’s can use the LAN: https://help.ubuntu.com/8.04/serverguide/C/libvirt.html

NOTE FOR LINUX MACHINES: when cloning a linux machine, don’t forget to delete /etc/udev/rules.d/70-persistent-net.rules: http://muffinresearch.co.uk/archives/2008/07/13/vmware-siocsifaddr-no-such-device-eth0-after-cloning/

>Extracting install files uploaded to Kace

>I’ve been working on Kace more and more recently, and I have come to realise that once you upload a binary file for a managed installation, you can’t download it again… At least, not easily. The following is one possible way for you to extract your binary back out of your Kace K1000 box — practical if you’ve lost or deleted your original file and do not wish to lose your work!

In order to proceed, you need to know a little about XML and how files work. You’re going to be working with a hex editor; if you’re not comfortable with that, you may wish to reconsider undertaking this little manipulation.

First, log into your Kace admin console over the web, then go to Settings > Security Settings. Scroll down to the Samba section and enable file sharing by ticking on the corresponding checkbox, and setting the admin password. Next, go to Settings > Resources > Export K1000 resources. Select your managed install package and under Actions, click on Export to Samba Share. This will effectively export your entire managed installation package to the \\k1000\clientdrop share.

Kace saves the configuration and binaries in a format that is relatively easy to read — a compressed XML file. It is saved as a file of extension .KPKG; if you rename the file to .ZIP, you can extract the underlying XML file to a location where you can work on it.

As mentioned before, you’ll need a hex editor in order to proceed. When working with Windows  I’ve used Olly even if it’s not really intended as a hex editor. If you’re a Linux buff, ghex is a great little tool, very simple and straightforward. For my experimentation, I went with HxD, which is free and is very much like ghex in terms of its simplicity.

Open up the XML and locate the beginning of your file. This is relatively simple if you’re used to working with raw files; if you’re not, you may find that this site might help you. I suspect that most of your binaries, like mine, will be self-extracting files — in other words, executables — in which case, the file header that you’re looking for is ’4D 5A’ (that’s “MZ” in ASCII). If you truncate your XML file just before that, you should be good to go!

>"Inventory fun" follow-up… AKA Kace’s built-in service tag and warranty report

>Back in September, I wrote a script to grab Dell machines’ warranty information based on their service tags, which I had retrieved from Kaseya or LANSweeper. A pretty nifty trick, or so I thought…

Since then, I’ve been introduced to Kace — a suite of Dell tools for inventory management, scripting, software and patch deployment, and ghosting. Think of it as a solution that offers the functionality of your FOG server, LANSweeper and Kaseya.

The Kace solution is actually divided in two parts: one component handles inventory, application and update deployment, and scripting, while the other component handles ghosting and driver management. A “component”, in this context, can be a piece of hardware (a physical server that you connect to your LAN – the O/S is a custom Unix distro) or simply a virtual machine (a VMWare application which can run on your existing ESX or ESXi box). The config is rather light — the only thing these devices need is storage space. For those of you that are looking for a free / SOHO-level solution for all your sysadmin  needs: stick to OSCInventory, Zabbix, and FOG… These things have a price tag.  However, though not free, these are well worth it in my opinion.

I digress. I’ve been porting a lot of my existing stuff to Kace recently; this week, I’m working on the inventory report from September. It turns out that my work is pretty much done: perhaps unsurprisingly, Kace already has a report for extracting machine names, service tags and expiry dates. They have two boiler-plate reports, dubbed “Dell Warranty Expired” and “Dell Warranty Expires in the next 60 days”, which dishes out all the information you may need in HTML, CSV or TXT format.

In point of fact, I needed something a bit more customized; I don’t actually need the full warranty information but rather the date the warranty expires. This is because with our clients, the machines are amortized from the moment we get them to the moment they’re no longer covered under warranty.

The nice thing about Kace is that you can take a boiler-plate report, duplicate it and change the SQL request directly, like with LANSweeper. This makes reporting a cinch. Here’s my final report for all machines on my campus:

SELECT DISTINCT M.NAME AS MACHINE_NAME,M.CS_MODEL AS MODEL, DA.SERVICE_TAG, DA.SHIP_DATE, M.USER_LOGGED AS LAST_LOGGED_IN_USER,
DW.END_DATE AS EXPIRATION_DATE
FROM KBSYS.DELL_WARRANTY DW
LEFT JOIN KBSYS.DELL_ASSET DA ON (DW.SERVICE_TAG = DA.SERVICE_TAG)
LEFT JOIN MACHINE M ON (M.BIOS_SERIAL_NUMBER = DA.SERVICE_TAG OR M.BIOS_SERIAL_NUMBER = DA.PARENT_SERVICE_TAG)
WHERE M.CS_MANUFACTURER LIKE ‘%dell%’
AND M.BIOS_SERIAL_NUMBER!=”
AND DA.DISABLED != 1
AND DW.END_DATE = (SELECT MAX(END_DATE) FROM KBSYS.DELL_WARRANTY DW2 WHERE DW2.SERVICE_TAG=DW.SERVICE_TAG AND DW2.SERVICE_LEVEL_CODE=DW.SERVICE_LEVEL_CODE);

It gives you a nice simple list with the machine name, model, service tag, shipment date, warranty expiry date, and the user that last logged on to the system.  Cool eh? Or maybe I’m just easily impressed. Regardless, it saves me time… Yay!

>Oh joy — wifi issues on Windows 7 (part 2)

>Last week, I thought I had discovered the reason why my wifi profiles weren’t sticking on my windows box. Yeah, that didn’t do squat – the quest continues.

 I used SysInternals’ Process Monitor to try and make heads or tails of what’s going on, but found little useful information. As far as I can tell, two executables are of interest to me: ZCfgSvc7.exe (this is the Intel wireless zero config service) and WLANExt.exe (the Windows WLAN config tool). These do two things: they read and write registry entries consistent with this nifty SANS forensics article, and they write to files in C:\ProgramData\Intel\Wireless\WLANProfiles and C:\Users\[user name]\AppData\Roaming\Intel\Wireless\WLANProfiles\. The registry is where basic wifi profile information is stored, and the files to which those executables write appear to be where the WEP and WPA keys are stored (playing connect-the-dots here, but the filenames are ITProfil.enc and profiles.enc). There are two sets of files – one in c:\ProgramData\… and one in c:\User… The first set is pretty much empty… But the second set seems to store the SSID and key. Can somebody confirm this?

I figured that if the config had been messed with (I did install a 3G dongle a few weeks ago…), it might be fixed by reinstalling the drivers. So that’s what I’ve done.

Does anybody know whether it’s the O/S’s responsibility to store those keys, or whether it’s the vendor’s responsibility?

>Oh joy — wifi issues on Windows 7

>This has been driving me nuts lately… Ever since I went back on Windows 7 I’ve had a crap-load of trouble with it storing my wifi connections. I have at least 5 locations that I visit on a regular basis and as you can imagine it’s a pain to have to re-key the wifi password every time.

I think I’ve finally found the source of my problem: I have multiple virtual adapters for each of my VPN technologies – these seem to prevent my wifi adapter from kicking in and finding my connection.

How do you fix this? Go into the Network and Sharing Center, hit “change adapter settings”, press the ALT key and go into Menu > Advanced Settings. In the Adapters and Bindings tab, move the adapters around so that your ethernet adapter is first, then your wireless connection, then all the other adapters.

Now, that seemed to do the trick for me. Will let you know if this sticks, but if anyone has any other suggestions (apart from the obvious and inevitable switch back to Linux) I’m all ears!!!

I promised myself that I would never return to a Windows environment. I find myself breaking that promise because of development needs… With some luck this is a temporary situation :)

>Quick analysis of a trojan targetting swiss users

>

We’ve seen a couple of cases of this trojan hitting client computers lately; unfortunately, the security bulletin by the CYCO doesn’t have much yet in terms of information on IP addresses, domain names, or what else the trojan might be doing in the background, so I dusted off the old forensics toolkit and did a bit of digging.
Look at this bad boy! Innit unreal? Brilliant :) I knew this kind of stuff was around but I must admit it’s the first time I encounter ransomware this targeted…

My colleague confirmed that this was only happening on the user’s account – not the local admin account present on the computer. So the first thing we did was run Sysinternals’ Process Monitor to identify what was causing the screen to appear. Note that we use Deep Freeze on users’ computers and the machine was frozen at the time of the infection, so it was likely that what was running was persisted on the user’s drive. I really wish that we could freeze everything but the user’s Desktop, My Docs, and Favorites – however, that seems to royally piss off our users. Would have prevented this from happening though.  Anyway, moving on. If you know that the only location where this executable could possibly exist is the user’s drive, it’s easy to identify the culprit:

No big surprise there — it’s running in the user’s Temp folder. Unsurprisingly as well, the user’s Software\Windows NT\CurrentVersion\Winlogon file has been modified to point the shell to that upd executable – that’s easily sussed out by using regripper or regdump. With regripper, we even get a timestamp of when this was done which will be useful for cross-referencing information later. 
OK great, so now we know where this thing is – how did it get there?
It was a bit harder to figure out how the hell the trojan got on the user’s computer, I’ll admit. I used Web Historian at first to identify any suspicious sites. I don’t know about the rest of you out there, but my experience is that when malware shows up on users’ computers, it’s typically because they’ve been downloading something illegal or, er, carnal. However, when looking at the user’s web history no alarm bells were going off. All good, clean, unremarkable sites. I went as far as to investigate the user’s mail store to see if the machine could have gotten infected by email – nothing suspicious there either. USB keys would have left a trace in the registry but since the machine was frozen, I wouldn’t be able to figure out if a key was inserted at the time of the infection. I therefore switched tactics and ran a timeline analysis of the user drive using sleuthkit. That’s when I found this:

The same minute the executable was written, something was written to the Java cache. Coincidence? Yeah right. I took a look at the index file, guess what I found?

If you decompile the JAR using jad, you get something like this:

If you check out the domain and IP address written in the index file, you’ll see that the domain is registered to a Russian registrant; the IP address traces back to the domain, but is hosted in the Netherlands.
That’s all the JAR file seems to do. I haven’t messed around with the upd.exe file yet, will probably do so sometime soon. In the meantime, I hope that you found this entertaining :-D Should I be looking at anything else? Let me know.

>Ironkey settings stick, even in read-only mode

>I am writing this post as a bit of a sanity check, perhaps someone out there can help me by comparing notes or providing explanations :)

Yesterday, I was using my IK to perform a memory dump for forensic analysis on a system infected with a trojan. I’ve used a CD for this in the past but figured “why not just use my IK in read-only mode” — I popped my IK in, making sure I ticked the [I]read-only mode[/I] checkbox. No problems there, of course. Performed a memory dump, which I wrote to a throw-away USB stick, then ejected my IK.

You know how your settings stick from one session to another? I figured this was recorded when the IK checked into the management console. However, when I popped my IK into another machine this morning, I noticed that the settings had stuck.

When I do my forensic analyses, they are in a different location than client sites – this is why I am 100% certain that the machine was not connected to the Internet – wifi was off in any case (though the wifi switch on laptops is sometimes software-managed) but even if it were on, the machine wouldn’t have any AP to connect to. No ethernet or bluetooth connection either, of course.

My theory, therefore, is that the settings are stored on some RW volume on the IK. Can anyone tell me more about this? Is there some part of the manual that I’ve overlooked? What gets written to that volume? What FS does it have, and can it be infected with malware? This would be disconcerting.

Any insight would be very much appreciated :)

>Fun with RSS feeds and hash tags

>Here’s a fun tip I had thought about but never gotten round to looking into: making the most of Twitter and your RSS feed reader!

I follow quite a few people on Twitter now. It helps me keep up-to-date with interesting articles in my field, find out what peers are doing, catch up on the latest funnies — but there comes a time when you’re following so many people that your stream becomes simply unmanageable — you just don’t have enough time to read through everything. And perhaps that’s the point; you’re just meant to quickly skim through the information, trust your subconscious to pick out items of interest.

And then, there are a few tricks that help keep you focused, like the one posted by Mark Sample on ProfHacker: it allows you to view items marked with hashtags as an RSS feed:

http://search.twitter.com/search.atom?q=%23xxxx

Here’s another one I like, as seen on SEO Alien – RSS feed for a particular person:

http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=xxxx

Now I could be wrong about this, but I don’t think the API implements a search for content limited to the people you follow. Presumably, you could write a ruby script that does this, though – or use your google-fu. Observe:

site:twitter.com @brucon volunteer

will return all tweets from (or mentioning) the user @brucon, with the word “volunteer” in it. You can toss that into a google alert as a feed (www.google.com/alerts), et voilà! Filtered twitter feeds. Note: the above search works fine on google.com but when I put it through google alerts, it gave me a smaller set of results – ye be warned.