Archive for the 'Coding' Category

Using Apache Cassandra with Apache Hadoop

I am currently working on a data analytics website for my own educational purposes and to fulfil my hacking/learning needs, I decided to use Apache Cassandra as the input/output storage engine for and Apache Hadoop map/reduce job.

The job in question is as simple as it gets: it reads the data from a table stored in a Cassandra database and identifies what are the most commonly used adjectives for each of the major communication service providers (CSPs) in Brazil. After processing, the results are stored in another table in the same Cassandra database. Basically, it is a fancier version of the famous Hadoop word count example.

Unfortunately, there seem to be a lack of modern documentation about integrating Hadoop and Cassandra. Even the official guide seem to be deficient/outdated about this subject. To add insult to the injury, I also wanted to use composite keys, which complicated things further. After reading the example source code in Cassandra source code, I was able to successfully implement a working job.

Despite the lack of documentation and the hacking required to figure out how to make it work, the process is quite simple and even an unexperienced Cassandra/Hadoop developer such as myself can do it without much trouble. In the paragraphs below you will find additional details about the Hadoop and Cassandra integration and what is required to make it work.

Finally, as it’s usual for my coding examples, the source code is available in my Github account under the open source Apache License v2.

Continue reading ‘Using Apache Cassandra with Apache Hadoop’

Development Goodies

These are just some development-related links and articles I have read in the last weeks which I think are worth mentioning:

Understanding webservices specifications (and more)

We all know that JSON and RESTful web services are the new darlings of the Internet and, to some extent, backend development these days. Their simplicity over other mechanisms are, undoubtedly, a good thing. However, a large amount of the backend development still (will continue to) rely on SOAP and other mechanisms to provide services. That’s why it’s so important to understand them. This series or articles from IBM Developer Works can help you understand them:

On the other hand, if you want to understand the RESTful side of the force, you may want to read about Developing RESTful Services using Apache CXF.

Enterprise Integration with Apache Camel

I’ve just published a mini e-book, in Portuguese, about Enterprise Integration with Apache Camel. If you happen to speak Portuguese, you can download it out here.

Quick tips for running Java applications on OpenShift

Apache Commons Configuration:

It’s pretty common to need to set hostname or a port for your service in OpenShift. If you’re using Apache Commons Configuration, there’s a quick an easy way to access variables exported by the cartridges. You can address the environment variables using the ‘env’ prefix.

Continue reading ‘Quick tips for running Java applications on OpenShift’

NoSQL: links for beginners

NoSQL databases are some of the hottest topics in the IT industry in the moment. A beginner can easily feel swamped with the amount of documentation available. Since I am a beginner to NoSQL as well, I separated two links which I access every now and then:

A Visual Guide to NoSQL explains how the commonly used NoSQL offerings relate to CAP Theorem.

A Beginner’s Guide to NoSQL is an article, originally written for the Software Developer’s Journal, that explain the basics principles and ideas behind the NoSQL databases.



Running the Simple Apache CXF Server Example on Red Hat Openshift

Today I dedicated some time to educate myself about OpenShift, the Red Hat’s Platform-As-A-Service offering. It allow us, developers, to quickly develop, deploy and provide scalable applications over the web.

To learn about it, I decided to deploy a really simple web application. I thought it would be a good idea to deploy the Simple CXF Server example on my free account. You can see it in action here. Because OpenShift documentation is quite extensive, it might be complicated for the beginner like me. So I decided to take notes of my steps while I deployed I simple Apache CXF-based application.

These are the steps I had to do:

Continue reading ‘Running the Simple Apache CXF Server Example on Red Hat Openshift’

SSPS: SDM Development and Latest Releases

You can find the documentation for the latest SDM version here. Also, version 0.2.3 is out and fixes a couple of annoying bugs. Check it out.

Toy Project: Download Server

In case you need an example about how to use one of these:

  • Unix Message Queues
  • Unix Sockets
  • POSIX Threads
  • LibCurl
  • Basic C usage

You may want to take a look at the source code of my toy project at GitHub. I don’t claim it to be good, bug free or even usable beyond what I need – much to the contrary: I don’t think I would show this at a job interview o.O. Anyway, feel free to check it out if you need an usage example of any of these technologies.

SSPS 0.2.0 is out

I just released another version of SSPS: no more XMLs, no more WebDav. Instead: Groovy-based scripts, Git or SVN repositories. Check it out.

Next Page »