public marks

PUBLIC MARKS from ogrisel with tag documentation

November 2007

wiki.dbpedia.org : Documentation

The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia information extraction framework is written using PHP 5. The framework is available from the DBpedia SVN (GNU GPL License). This pages describes the DBpedia information extraction framework. The framework consists of the interfaces: Destination, Extractor, Page Collection and RDFnode, plus the essential classes Extraction Group, Extraction Job, Extraction Manager, Extraction Result and RDFtriple.

October 2007

Writing An Hadoop MapReduce Program In Python - Michael G. Noll

Even though the Hadoop framework is written in Java, programs for Hadoop need not to be coded in Java but can also be developed in other languages like Python or C (the latter since version 0.14.1). However, the documentation and the most prominent Python example on the Hadoop home page could make you think that you must translate your Python code using Jython into a Java jar file. Obviously, this is not very convenient and can even be problematic if you depend on Python features not provided by Jython. Another issue of the Jython approach is the overhead of writing your Python program in such a way that it can interact with Hadoop - just have a look at the example in ${HADOOP_INSTALL}/src/examples/python/WordCount.py and you see what I mean. I still recommend to have at least a look at the Jython approach and maybe even at the new C MapReduce API called Pipes, it's really interesting. Having that said, the ground is prepared for the purpose of this tutorial: writing a Hadoop MapReduce program in a more Pythonic way, i.e. in a way you should be familiar with.

September 2007

Power PostgreSQL - PerfList

(via)
This is a set of rules of thumb for setting up your PostgreSQL 8.0 server. A lot of the below is based on anecdotal evidence or practical scaling tests; there's a lot about database performance that we, and OSDL, are still working out. However, this should get you started. All information below is useful as of January 12, 2005 and will likely be updated later. Discussions of settings below supercede the recommendations I've made on General Bits.

ogrisel's TAGS related to tag documentation

api +   concurrent +   database +   framework +   hadoop +   information +   java +   knowledge extraction +   MapReduce +   memory +   performance +   php +   postgresql +   python +   rdf +   semantic web +   tuning +   web of data +