Run Standard Tiling Jobs

A typical Aperture Tiles application contains a heatmap layer that illustrates the density of numeric data at several levels over a map or plot. The process of generating the tile pyramid that represents this type of layer is a standard tiling job.

Standard tiling jobs can be executed with the CSVBinner, which is available in the Aperture Tiles source code.

For information on creating custom tiling jobs that ingest data that is not character delimited or contains non-numeric fields, see the Run Custom Tiling Jobs topic.

CSVBinner

The CSVBinner ingests numeric, character-separated (e.g., CSV) tabular data. To define the tile set you want to create, you must pass two types of properties files to the CSVBinner:

  • A base properties file, which describes the general characteristics of the data
  • Tiling properties files, each of which describes the specific attributes you want to tile

During the tiling job, the CSVBinner writes a set of Avro tile data files to the location of your choice (HBase or local file system).

To execute the CSVBinner and run a standard tiling job
  • Use the spark-submit script and pass in the names of the properties files you want to use. For example:

    spark-submit --class com.oculusinfo.tilegen.examples.apps.CSVBinner 
    lib/tile-generation-assembly.jar -d /data/twitter/dataset-base.bd 
    /data/twitter/dataset.lon.lat.bd
    

    Where the -d switch specifies the base properties file path, and each subsequent file path specifies a tiling properties file.

Base Properties Files

The base properties file describes the tiling job, the systems on which it will run and the general characteristics of the source data. The following properties must be defined in the file:

Additional properties are described in the advanced Standard Tiling topic.

Source Location

  • Indicate where your source data is stored:

    Property Description
    oculus.binning.source.location Filename or directory containing the source data. If set to a directory, all of its contents will be ingested. Example values:
    • file://data/test.data
    • hdfs://hadoop-s1/data/julia/500by200
    • datasets/julia

Source Data Format

  1. Indicate how the columns in your source data are separated:

    Property Description
    oculus.binning.parsing.separator Character or string used to separate columns in the source data. Defaults to tab character (\t).
  2. Pass in the fields used to write your data to the tile pyramid. The tile generation job requires two types of fields:

    • Index fields indicate where on the map/plot your data points are located
    • A value field contains the value of the data points at the corresponding indexes

    For example:

    # Index Fields
    oculus.binning.parsing.longitude.index=0
    oculus.binning.parsing.longitude.fieldType=double
    oculus.binning.parsing.latitude.index=1
    oculus.binning.parsing.latitude.fieldType=double
    
    # Value Field
    oculus.binning.parsing.value.index=2
    oculus.binning.parsing.value.fieldType=double
    

    Where:

    Property Description
    oculus.binning.parsing.<field>.index Column number of the described field in the source data files. This property is mandatory for every field type to be used.

    NOTE: The oculus.binning.index.type you select in your Tiling properties file determines the number of index files you must include.

    oculus.binning.parsing.<field>.fieldType Type of value expected in the column specified by oculus.binning.parsing.<field>.index:
    • double (*default*)
    • constant
    • zero
    • int
    • long
    • date
    • boolean

    For more information on the supported field types, see the advanced Standard Tiling topic.

Source Data Manipulation

Additional properties are available for scaling fields logarithmically before they are used in the tile generation job. For more information, see the advanced Standard Tiling topic.

Tile Storage

  1. Specify where to save your tile set:

    Property Description
    oculus.tileio.type Location to which tiles are written:
    • file (default): Writes .avro files to the local file system. NOTE: If this type is used on a distributed cluster, the file is saved to the worker node, not the machine that initiates the tiling job.
    • hbase: Writes to HBase. For further HBase configuration properties, see the following step.
  2. If you are saving your tile set to HBase, specify the connection properties. Otherwise, skip to the next step.

    Property Description
    hbase.zookeeper.quorum Zookeeper quorum location needed to connect to HBase.
    hbase.zookeeper.port Port through which to connect to zookeeper. Defaults to 2181.
    hbase.master Location of the HBase master to which to write tiles.
  3. Edit the tile set name and metadata values:

    Property Description
    oculus.binning.name Name (path) of the output data tile set pyramid. If you are writing to a file system, use a relative path instead of an absolute path. If you are writing to HBase, this is used as a table name. This name is also written to the tile set metadata and used as a plot label.
    oculus.binning.prefix Optional prefix to be added to the name of every pyramid location. Used to distinguish different tile generation runs. If not present, no prefix is used.
    oculus.binning.description Description to put in the tile metadata.

Tiling Properties Files

The tiling properties files define the tiling job parameters for each layer in your visual analytic, such as which fields to bin on and how values are binned:

Additional properties are described in the advanced Standard Tiling topic.

Projection

The projection properties define the area on which your data points are plotted.

  1. Choose the map or plot over which to project your data points:

    Property Description
    oculus.binning.projection.type Type of projection to use when binning data. Possible values are:
    • areaofinterest or EPSG:4326 (default) - Bin linearly over the whole range of values found.
    • webmercator, EPSG:900913 or EPSG:3857 - Web-mercator projection. Used for geographic values only.
  2. If your projection type is areaofinterest, decide whether to manually set the bounds of the projection.

    Property Description
    oculus.binning.projection.autobounds Indicates how the minimum and maximum bounds should be set:
    • true (default) - Automatically
    • false - Manually
    Note that this property is not applicable to webmercator projection.
  3. If you chose to set the bounds manually, specify the minimum and maximum x- and y-axis values:

    Property Description
    oculus.binning.projection.minx Lowest x-axis value
    oculus.binning.projection.maxx Highest x-axis value
    oculus.binning.projection.miny Lowest y-axis value
    oculus.binning.projection.maxy Highest y-axis value

Index

The index properties specify the fields used to locate the binning value on the projection and map them to the corresponding tile bins.

  1. Specify the index scheme:

    Property Description
    oculus.binning.index.type Scheme used to indicate how the binning values are mapped to the projection:
    • cartesian (default) - Cartesian (x/y) coordinates
    • ipv4 - IP address (v4)
    • timerange - Standard time range and cartesian point index

    The scheme also determines the number of index fields you must specify.
  2. Map the index fields you specified in your base properties file to the fields required by the scheme you selected.

    Property Description
    oculus.binning.index.field.<order> List of fields that satisfy the chosen index scheme:
    cartesian (default):
    oculus.binning.index.field.0=<x_Field> oculus.binning.index.field.1=<y_Field>
    ipv4:
    oculus.binning.index.field.0=<IPv4_Field>
    timerange:
    oculus.binning.index.field.0=<time_Field> oculus.binning.index.field.1=<x_Field> oculus.binning.index.field.2=<y_Field>

NOTE: An additional index scheme (segment) is available for tiling node and edge graph data. See the Graph Tiling topic for more information.

Value

The value properties specify the field to use as the binning value and how multiple values in the same bin should be combined.

Property Description
oculus.binning.value.type Specifies how to determine the values written to each bin:
  • count (default) - Count the number of records in a bin. If selected, no value field is required.
  • field - Perform an aggregation on the values of all records in a bin to create a single value for the bin
  • series - Save the values of all records in a bin to a dense array
oculus.binning.value.field Field to use as the bin value. If no value field is provided, the tiling job will write the count of records in each bin as the bin value.
oculus.binning.value.valueType Type of values stored in the value field:
  • double (default) - Real, double-precision floating-point numbers
  • int - Integers
  • long - Double-precision integers
  • float - Floating-point numbers
oculus.binning.value.aggregation Method of aggregation used on the values of all records in a bin when oculus.binning.value.type = field. Creates a single value for the bin:
  • sum (default)- Sum the numeric values of all records in the bin.
  • min - Select the minimum numeric value from the records in the bin.
  • max - Select the maximum numeric value from the records in the bin.
  • mean - Calculate the mean numeric value of all records in the bin.
  • stats - Calculates the mean and standard deviation numeric values of all records in the bin.

Levels

The oculus.binning.levels.<order> array property describes how the tiling job executes the generation of the various zoom levels. For example, if you want to generate levels in three groups, you should include:

oculus.binning.levels.0
oculus.binning.levels.1
oculus.binning.levels.2

Each group should describe the zoom levels to bin simultaneously as a comma-separated list of individual integers or a range of integers (described as start-end). For example, "0-3,5" means levels 0, 1, 2, 3, and 5.

This property is mandatory, and has no default.

Defining Level Sets

Which levels you should bin together depends both on the size of your cluster and your data. Note that if you include multiple level sets, the raw data is parsed once and cached for use with each level set.

Each binning job has two costs: overhead and tiling. In our cluster:

  • Overhead cost is generally dominant from levels 0-8. Tiling these levels together will reduce job time.
  • Tiling cost is dominant above level 8. There is a risk of out of memory job failure errors when simultaneously binning these levels together due to the large number of tiles generated.

Therefore, our typical use case has:

binning.level.0=0-8
binning.level.1=9
binning.level.2=10

Next Steps

For details on testing the output of your tiling job, see the Testing Tiling Job Output topic.