Load data into Palo OLAP cubes with Talend tPaloDimension and tPaloOutputMulti

Palo provides an interesting in Memory OLAP Server that uses MDX (XMLA), Excel or Open Office as query tool. The resulting documents (Excel or Open Office worksheets) can be published on a web portal and shared with users. With Talend you can extend Palo default ETL tool with a comprehensive data integration solution and achive more control on data processing.

Being based on an in memory technology Palo is quite different for a common Relational OLAP model (for example Mondrian). This gives some advantages (speed), some disavantages (when cubes structure changes data must be reloaded) and require a somewhat different approach on OLAP cubes definition.

 

Palo Cube preparation

Before proceding with the data loading in Palo is necessary to build a Palo database with a cube and some dimensions. The article creating Palo OLAP cubes and dimensions with Talend covers a Palo OLAP cube creation usingTalend Open Studio. In alternative you can use the web tool Palo Cube Manager in Palo Suite

The Palo structure used in this article to load the data is:

  • Database: "MyDB"
  • Dimensions: "Customers", "Categories", "Measures" (with "ListPrice" and "Quantity" as numeric elements)
  • Cube: "MyCube" containing "Customers", "Categories", "Measures" dimensions

 

Populate Customers Palo dimension with tPaloDimension

The first step is to populate Palo dimensions. To load data into a "Customers" dimension we use a datasource linked to a tMap and a final tPaloDimension component, as in the next picture.

 

Populate Palo OLAP dimension with Talend tPaloDimension

 

The datasource (for example a tMySQLInput) reads customers data in a column named "Customers".

The tMap component is necessary to add a new column named "AllCustomers" to the flow. This column contains only a constant value of "All Customers". "All Customers" will be used as parent member (root) for all the Customers. The tMap component is configured as follow:

 

Populate Customers dimension with tMap

 
The final step is to link the tMap to a tPaloDimension component configured as follow:
Database: "MyDB"
Dimension:"Customers"
Action on dimension:"Create dimension if not exists"
Create dimension elements: checked
Consolidation type: Self-Referenced
Edit schema: copy the same schema as tMap output
Input column:
AllCustomers; Parent
Customers; Child
 
The parameter Self-referenced and the hiearchy definition allows Palo to define a dimension with "All Customers" as parent member. After running the job and loggin in the Palo OLAP Manager (http://localhost:8080) the result should be as follow:
 
Palo OLAP manager consolidated dimension
 
If somethig goes wrong make sure a database called "MyDB" exists, created manually by the user.
 

Using tPaloDimension to populate Categories dimension

For the "Categories" dimension the steps are like the previous ones. Simply place a datasource for Categories, link it to a tMap, add a field "AllCategories" to the flow with a constant value of "All Categories". Then link the tMap with a tPaloDimension component. The final result should be as follow:
 
Load aggregated dimensions in Palo with Talend
 

Load a Palo data cube with Talend

The final step is to load a Palo Cube using Talend tPaloOutputMulti component. Similar results can be achived with tPaloOutput component, but tPaloOutputMulti gives more flexibility during the loading process.
 
Palo stores OLAP cubes in a in memory optimized multidimensional array, this is quite different from a common relational approach using for example a star schema.
 
From our data source we'll add all the dimensions columns and only a measure column at time. For example the first records load contains the Customer, Category and ListPrice values. The second data load contains the Customer, Category and Quantity values, that is only a measure at a time with all its dimensions. Palo composes the cube adding the measure on every load in a in memory cartesian product.
 
In a new Talend job put a datasource component (like tMSSQLInput) linked to a tMap and a final tPaloOutputMulti component, like in this picture.
 
Load Palo Cube with Talend tPaloOutputMulti
  
The tMSSQLInput component load the cube data from a relational database. The tMap component maps a subset of this dataset containing only a measure, in this case "ListPrice". In the tMap add the Categories and Customers columns and a column named "Measures" with a constant value of "ListPrice". This column tells Palo we are using ListPrice as current Measure. The last output field is the measure value, in this case ListPrice. The result should be as follow:

Load Palo cube with talend tMap

The last component in the job is a tPaloOutputMulti, with these values:

Database: "MyDB"
Cube: "MyCube"
Schema: use the same schema as tMap output
Measure value: ListPrice
Add values: checked

Next save and run the job, the Palo Cube should contains some values like in this:

 load Palo cube with talend result

If you link the Palo cube in a Excel (or OpenOffice) spreadsheet you'll notice that Categories, Customers and ListPrice are correctly displayed, while the Quantity value is always 0. 

In the next step we'll add the values for Quantity in a similar way, replacing the value for the field "Measures" with "Quantity" in the tMap and using the field "Quantity" as last field instead of "ListPrice". 

The tMap should be configured as follow:

Talend load Palo measures

 

And the tPaloOutputMulti should be configured with the following values:

Database: "MyDB"
Cube: "MyCube"
Schema: use the same schema as tMap output
Measure value: Quantity
Add values: checked

 

Talend Palo Cube tPaloOutput

 

A very important parameter is Add values: checked. This allows the measures to be added in consecutive steps. In the first steps all Quantity values are 0 because only ListPrice values are set. In the next step the values of Quantity will be added to the cube.

Exploring the Palo cube from Excel the result should be as follow:

Talend Palo Cube Excel final result

 

Measures values in the current version of Palo have two type of format: text (string) and numeric (double). Other format like integer, datetime, bigint should be converted to string or double.

tags: