Posts

Managing data access policies in Hive

Image
Managing data access policies in Hive To efficiently manage access policies in our Hive Hadoop cluster at Svenska Spel we developed a tool Cobra-policytool . Instead of managing our policies in Apache Rangers web interface and tagging information in Apache Atlas it is now integrated into our CI/CD pipeline. We can use our normal development process for our polices too. In this post I will describe the background to Cobra-policytool and demonstrate how to use it. We recently open sourced Cobra-policytool to be able to collaborate with others. Earlier this year Svenska Spel migrated our Hadoop cluster to a new cluster using Hortonworks HDP. At the same time we decided to use Kerberos, Apache Atlas , and Apache Ranger to get good security and powerful access control. If you do not know what Atlas and Ranger does I recommend to take a look at my talk from DataWorks Summit Europe 2018 . With Atlas and Ranger in place, next step was to implement and enforce our policies. Soon we...

Upgrade to Ubuntu 10.10 and ATI-graphics

If you read this before you upgrade Ubuntu to 10.10. Remove/inactivate all your binary drivers before you upgrade. If you read this after you upgrade Ubuntu to 10.10 and get a messy screen so you do not see anything and can't login. Reboot your computer in failsafe mode. Select to become root and do: apt-get install fglrx I also remembered that I got in the last installation phase at the upgrade a complaint about problem removing fglrx. Why do I suggest to install fglrx when we actually have a problem to remove it? Well it is easy. Since it was half installed and I could not remove it, then I tried to install it again. I saw it installed dkms and compiled a new version of the driver for my kernel. I will now try to run the open source variant instead. I hate closed source, it is only problems.

The generalization of the day

I use the web to find solutions of my problems, as most people do today. I have noted a huge difference between two communities I am using, the Java and the Linux community. With the Linux community I mean it in a broad sense like Linux distributions and common software running on Linux like Postgres, Apache, Evolution, F-spot, Python etc. I.e classic open source software. When I search on a problem in the Linux area I most of the time get good hits with people having the same problem and good answers from people who knows how to solve it. The hits are on mailing lists, forums, and blogs. On the other hand when searching on problems around Java I most of the time find people who have the same problem, but no answers. I know this is a very big generalization but are anybody else experience the same? Why are people in classic open source more willing to help than in the Java world?

Rule of thumbs for API designing

I have just read a very interesting article " API Design Matters " by Michi Henning . He discusses the problem with bad designed and gives some rules of thumb. We all have some time used an API that does not feel right and makes the coding hard. It is very interesting to read Michis analyze of this and why. My feeling is that he had put the hammer right on the nail. Here is a summary of the rules he discuss. I assume you will feel like me when you read them; yeah thats obvious. But thats the nice thing and the important is to get the list. An API : must provide sufficient functionality for the caller to achieve its task. should be minimal without imposing undue inconvenience on the caller. should be policy free if it is general purpose. should be policy rich if it is special purpose. should be designed from the perspective of the caller. don't let the caller configure "everything". should be documented before it is implemented. If you want to read the discussion...

Moved to France

Not me but my mail server has moved to France or to be more correct is now hosted by the French hosting provider Gandi.net . According to GeoIP it is in Paris. I rent a small virtual machine from them to host my web and mail server. This makes me less dependent of my broad band connection and I skip the problem that my IP-address may change without notice. At Gandi I have an Ubuntu 9.04 Linux machine with 256kb of memory and 8GB of disk. This is more than enough for me to hosting private mail and web. The mailsever is postfix and dovecot. I have my own CA-authority to handle secure connections from my mail clients.

Spring framework to Python

I just noticed that Springsource has released the Spring framework for Python. Interesting, especially since i like the Spring framework in Java and I prefer Python as language in a lot of cases. I must try it some day....

IPhone - not for me

The new IPhone looks nice. The only (?) problem is that, as far as I know, it cannot be fully used from an Linux/Ubuntu only environment. They assume you have a Mac or MS Windows machine. I think I have to wait until some one starts selling Android phones in Sweden and not require purchase by installments for the phone.