Software, science and random thoughts about IT
Regardless the introduction of tools such as Puppet, Vagrant, Apache Maven, Jenkins and many others tools that automate the job away, a lot of software development teams still rely on outdated processes and manual labor to perform the bulk of delivery.
Unsurprisingly, the excuses for relying on outdated development practices haven’t changed either:
What I want to point out is that more than just laying out algorithms in a text file, delivering great products involve processes, automation and discipline (see observation below). Just like a pit stop in a Formula 1 race:
(An outdated, manual and loosely disciplined approach versus a modern, automated and highly disciplined approach).
Obs.: discipline as in a systematic, ordered approach to development, and not to be confused with blindly following the rules or an unquestioning behavior.
This week I needed to show a colleague how to use Apache Camel, Apache CXF and Spring to create a web-based integration application. To do so, I created a Camel-based implementation of the Simple Apache CXF examples I wrote in 2012. Although this topic is covered more than once on Camel documentation, some details are either missing, which can make it tricky to run this setup this the first time, or are specific to a the application server where the code will run.
Therefore, I created this example (which you can find in this repository in my GitHub account) to complement the official documentation with additional details. I used the open source GlassFish application server to run the code.
As I explained in an earlier post, Vagrant now supports Parallels as a provider. Since I wanted to test how they were working together, I created a standard 64bit Gentoo Linux box that you can download and use. In addition to a standard Gentoo install, the box also comes with Puppet installed, so you can do some actual work on it.
Since I presume you already have the Parallels provider setup by now, this is how you can download and use the box:
vagrant init orpiske/gentoo-linux-64 && vagrant up
After the box is downloaded from the cloud you can use vagrant as usual (ie.: vagrant ssh, etc).
Maybe this is not news anymore, but Vagrant now supports Parallels. It seems to work with Parallels Desktop 8 and above, but I wasn’t able to run it 9 on OS X Yosemite. Upgrading to Parallels Desktop 10 seems to have fixed the issue and it worked like a charm. One additional problem is that there’s a shortage of images in the Vagrant Cloud. Although I believe this will be fixed as the community grows and share more templates on the cloud, this may be an nuisance to some users.
I have been using Logstash extensively lately. Along with ElasticSearch, it’s a great tool to centralize the logs and simplify access to them. The only difficulty I had was related to supporting multiline log messages, such as those printed by Java stacktraces. I have found some good examples online, but none seemed to work the way I wanted. In some cases, I also got my messages tagged as _grokparsefailure, which indicated that the parser failed to process the regex. I ended up with one that it’s not so different after all but which did match exactly the way we log messages with log4j:
(^.+Exception.+)|(^\s+at .+)|(^\s+... \d+ more)|(^\s*Caused by:.+)
I am currently working on a data analytics website for my own educational purposes and to fulfil my hacking/learning needs, I decided to use Apache Cassandra as the input/output storage engine for and Apache Hadoop map/reduce job.
The job in question is as simple as it gets: it reads the data from a table stored in a Cassandra database and identifies what are the most commonly used adjectives for each of the major communication service providers (CSPs) in Brazil. After processing, the results are stored in another table in the same Cassandra database. Basically, it is a fancier version of the famous Hadoop word count example.
Unfortunately, there seem to be a lack of modern documentation about integrating Hadoop and Cassandra. Even the official guide seem to be deficient/outdated about this subject. To add insult to the injury, I also wanted to use composite keys, which complicated things further. After reading the example source code in Cassandra source code, I was able to successfully implement a working job.
Despite the lack of documentation and the hacking required to figure out how to make it work, the process is quite simple and even an unexperienced Cassandra/Hadoop developer such as myself can do it without much trouble. In the paragraphs below you will find additional details about the Hadoop and Cassandra integration and what is required to make it work.
Finally, as it’s usual for my coding examples, the source code is available in my Github account under the open source Apache License v2.
These are just some development-related links and articles I have read in the last weeks which I think are worth mentioning:
We all know that JSON and RESTful web services are the new darlings of the Internet and, to some extent, backend development these days. Their simplicity over other mechanisms are, undoubtedly, a good thing. However, a large amount of the backend development still (will continue to) rely on SOAP and other mechanisms to provide services. That’s why it’s so important to understand them. This series or articles from IBM Developer Works can help you understand them:
On the other hand, if you want to understand the RESTful side of the force, you may want to read about Developing RESTful Services using Apache CXF.