You are here

Step 7. Setting up the database schema and ETL jobs

Configure the ETL (Extract, Transform and Load) jobs and create the database schema for SkyVault Analytics.
Make sure that you have your Data Integration database (MySQL or PostgreSQL) set up and working. Use this information to set up Kettle (the tool that runs the ETL jobs) to use this database, and to connect to the SkyVault database and Analytics server.
  1. Navigate to the Analytics installation directory. Use the data-integration/ETL/config/config.properties.sample to create a kettle.properties file in <kettle-user-home>/.kettle/, where <kettle-user-home> is the home directory of the user that will run the Kettle script:
    1. In the SkyVault DATABASE section, enter the details required to read from the database used by your SkyVault installation.
    2. In the ETL (WAREHOUSE) DATABASE section, enter the details required to read from and write to the database that you created in Setting up the DI database.

      Important: The db_dialect attribute must be set to pgsql or mysql, depending on whether you have PostgreSQL or MySQL configured as your DI database.
    3. In the BA SERVER CONNECTION section, set the value of ba_serverURL to the URL of the Analytics server that the ETL job uses to flush the cache; for example, http://localhost:8080/ba-server.
    4. In the same section, set the ba_admin and ba_password values to match the user name and password that you have set for the Analytics administrator user.
    5. In the ETL Staged Message Purge (Hours) section, you can optionally change the value of etl_purge_msg_older_than_hrs. By default, all processed messages are purged after 72 hours.
    6. In the BA server cache flush period (mins) section, you can optionally change the value of etl_ba_server_cache_flush. By default, the Mondrian cache is flushed every 360 minutes (6 hours).

    You can review the default settings for config.properties.sample in config.properties. Do not edit config.properties directly.

  2. Verify that the database parameters are correct by running the test_setup.kjb job:

    cd data-integration 
    ./kitchen.sh  /file:ETL/test_setup.kjb

    Alternatively, use this command if the user that created the .kettle/kettle.properties file is different from the current user:

    cd data-integration 
    sudo su <kettle-user> -c './kitchen.sh /file:ETL/test_setup.kjb'
    where <kettle-user> is the user that created the kettle.properties file.

    The SkyVault, Process and Pentaho database connections should indicate that processing completed successfully with an output of result=[true]. Here is a sample log from test_setup.kjb:

    2015/01/06 16:04:52 - test_setup - Starting entry [Output database details]
    2015/01/06 16:04:52 - config - SkyVault: jdbc:mysql://localhost:3306/SkyVault 
    2015/01/06 16:04:52 - config - Process: jdbc:mysql://localhost:3306/SkyVault 
    2015/01/06 16:04:52 - config - Pentaho: jdbc:mysql://localhost:3306/pentaho_di 
    2015/01/06 16:04:52 - test_setup - Starting entry [Check SkyVault Db Connection] 
    2015/01/06 16:04:52 - test_setup - Starting entry [Check Pentaho Db connection] 
    2015/01/06 16:04:52 - test_setup - Starting entry [Check Process Db connection] 
    2015/01/06 16:04:52 - test_setup - Finished job entry [Check Process Db connection] (result=[true]) 
    2015/01/06 16:04:52 - test_setup - Finished job entry [Check Pentaho Db connection] (result=[true]) 
    2015/01/06 16:04:52 - test_setup - Finished job entry [Check SkyVault Db Connection] (result=[true]) 
    2015/01/06 16:04:52 - test_setup - Finished job entry [Output database details] (result=[true]) 
    2015/01/06 16:04:52 - test_setup - Finished job entry [set env] (result=[true]) 
    2015/01/06 16:04:52 - test_setup - Job execution finished 
    2015/01/06 16:04:52 - Kitchen - Finished! 
    2015/01/06 16:04:52 - Kitchen - Start=2015/01/06 16:04:49.979, Stop=2015/01/06 16:04:52.881 
    2015/01/06 16:04:52 - Kitchen - Processing ended after 2 seconds. 
    You are now ready to invoke the ETL job, kitchen.sh, that creates the database schema and populates initial data into the data warehouse.
  3. From the data-integration directory, run the Kettle kitchen.sh script:

    ./kitchen.sh /file:ETL/schema_setup.kjb

    If the user that created the kettle.properties file is different from the current user, you will need to use the command:

    cd data-integration 
    sudo su <kettle-user> -c './kitchen.sh /file:ETL/schema_setup.kjb
    where <kettle-user> is the user that created the kettle.properties file.

    This script runs the ETL job to create the schema and to populate initial data into the data warehouse.

  4. Navigate to ba-server/tomcat/conf/context.xml and check that the jdbc/AlfrescoDataSource Resource element details are the same as those that you made to the kettle.properties file.