Run Standard Tiling Jobs
A typical Aperture Tiles application contains a heatmap layer that illustrates the density of numeric data at several levels over a map or plot. The process of generating the tile pyramid that represents this type of layer is a standard tiling job.
Standard tiling jobs can be executed with the CSVBinner, which is available in the Aperture Tiles source code.
For information on creating custom tiling jobs that ingest data that is not character delimited or contains non-numeric fields, see the Run Custom Tiling Jobs topic.
The CSVBinner ingests numeric, character-separated (e.g., CSV) tabular data. To define the tile set you want to create, you must pass two types of properties files to the CSVBinner:
- A base properties file, which describes the general characteristics of the data
- Tiling properties files, each of which describes the specific attributes you want to tile
During the tiling job, the CSVBinner writes a set of Avro tile data files to the location of your choice (HBase or local file system).
To execute the CSVBinner and run a standard tiling job
- Use the spark-submit script and pass in the names of the properties files you want to use. For example:
spark-submit --class com.oculusinfo.tilegen.examples.apps.CSVBinner lib/tile-generation-assembly.jar -d /data/twitter/dataset-base.bd /data/twitter/dataset.lon.lat.bd
-dswitch specifies the base properties file path, and each subsequent file path specifies a tiling properties file.
Base Properties Files
The base properties file describes the tiling job, the systems on which it will run and the general characteristics of the source data. The following properties must be defined in the file:
Additional properties are described in the advanced Standard Tiling topic.
- Indicate where your source data is stored:
Property Description oculus.binning.source.location Filename or directory containing the source data. If set to a directory, all of its contents will be ingested. Example values:
Source Data Format
- Indicate how the columns in your source data are separated:
Property Description oculus.binning.parsing.separator Character or string used to separate columns in the source data. Defaults to tab character (\t).
- Pass in the fields used to write your data to the tile pyramid. The tile generation job requires two types of fields:
- Index fields indicate where on the map/plot your data points are located
- A value field contains the value of the data points at the corresponding indexes
# Index Fields oculus.binning.parsing.longitude.index=0 oculus.binning.parsing.longitude.fieldType=double oculus.binning.parsing.latitude.index=1 oculus.binning.parsing.latitude.fieldType=double # Value Field oculus.binning.parsing.value.index=2 oculus.binning.parsing.value.fieldType=double
Property Description oculus.binning.parsing.<field>.index Column number of the described field in the source data files. This property is mandatory for every field type to be used.
NOTE: The oculus.binning.index.type you select in your Tiling properties file determines the number of index files you must include.
oculus.binning.parsing.<field>.fieldType Type of value expected in the column specified by oculus.binning.parsing.<field>.index:
- double (*default*)
For more information on the supported field types, see the advanced Standard Tiling topic.
Source Data Manipulation
Additional properties are available for scaling fields logarithmically before they are used in the tile generation job. For more information, see the advanced Standard Tiling topic.
- Specify where to save your tile set:
Property Description oculus.tileio.type Location to which tiles are written:
- file (default): Writes .avro files to the local file system. NOTE: If this type is used on a distributed cluster, the file is saved to the worker node, not the machine that initiates the tiling job.
- hbase: Writes to HBase. For further HBase configuration properties, see the following step.
- If you are saving your tile set to HBase, specify the connection properties. Otherwise, skip to the next step.
Property Description hbase.zookeeper.quorum Zookeeper quorum location needed to connect to HBase. hbase.zookeeper.port Port through which to connect to zookeeper. Defaults to 2181. hbase.master Location of the HBase master to which to write tiles.
- Edit the tile set name and metadata values:
Property Description oculus.binning.name Name (path) of the output data tile set pyramid. If you are writing to a file system, use a relative path instead of an absolute path. If you are writing to HBase, this is used as a table name. This name is also written to the tile set metadata and used as a plot label. oculus.binning.prefix Optional prefix to be added to the name of every pyramid location. Used to distinguish different tile generation runs. If not present, no prefix is used. oculus.binning.description Description to put in the tile metadata.
Tiling Properties Files
The tiling properties files define the tiling job parameters for each layer in your visual analytic, such as which fields to bin on and how values are binned:
Additional properties are described in the advanced Standard Tiling topic.
The projection properties define the area on which your data points are plotted.
- Choose the map or plot over which to project your data points:
Property Description oculus.binning.projection.type Type of projection to use when binning data. Possible values are:
- areaofinterest or EPSG:4326 (default) - Bin linearly over the whole range of values found.
- webmercator, EPSG:900913 or EPSG:3857 - Web-mercator projection. Used for geographic values only.
- If your projection type is areaofinterest, decide whether to manually set the bounds of the projection.
Property Description oculus.binning.projection.autobounds Indicates how the minimum and maximum bounds should be set:
- true (default) - Automatically
- false - Manually
- If you chose to set the bounds manually, specify the minimum and maximum x- and y-axis values:
Property Description oculus.binning.projection.minx Lowest x-axis value oculus.binning.projection.maxx Highest x-axis value oculus.binning.projection.miny Lowest y-axis value oculus.binning.projection.maxy Highest y-axis value
The index properties specify the fields used to locate the binning value on the projection and map them to the corresponding tile bins.
- Specify the index scheme:
Property Description oculus.binning.index.type Scheme used to indicate how the binning values are mapped to the projection:
- cartesian (default) - Cartesian (x/y) coordinates
- ipv4 - IP address (v4)
- timerange - Standard time range and cartesian point index
The scheme also determines the number of index fields you must specify.
- Map the index fields you specified in your base properties file to the fields required by the scheme you selected.
Property Description oculus.binning.index.field.<order> List of fields that satisfy the chosen index scheme:
- cartesian (default):
- oculus.binning.index.field.0=<x_Field> oculus.binning.index.field.1=<y_Field>
- oculus.binning.index.field.0=<time_Field> oculus.binning.index.field.1=<x_Field> oculus.binning.index.field.2=<y_Field>
NOTE: An additional index scheme (segment) is available for tiling node and edge graph data. See the Graph Tiling topic for more information.
The value properties specify the field to use as the binning value and how multiple values in the same bin should be combined.
Specifies how to determine the values written to each bin:
|oculus.binning.value.field||Field to use as the bin value. If no value field is provided, the tiling job will write the count of records in each bin as the bin value.|
Type of values stored in the value field:
Method of aggregation used on the values of all records in a bin when oculus.binning.value.type = field. Creates a single value for the bin:
The oculus.binning.levels.<order> array property describes how the tiling job executes the generation of the various zoom levels. For example, if you want to generate levels in three groups, you should include:
oculus.binning.levels.0 oculus.binning.levels.1 oculus.binning.levels.2
Each group should describe the zoom levels to bin simultaneously as a comma-separated list of individual integers or a range of integers (described as start-end). For example, "0-3,5" means levels 0, 1, 2, 3, and 5.
This property is mandatory, and has no default.
Defining Level Sets
Which levels you should bin together depends both on the size of your cluster and your data. Note that if you include multiple level sets, the raw data is parsed once and cached for use with each level set.
Each binning job has two costs: overhead and tiling. In our cluster:
- Overhead cost is generally dominant from levels 0-8. Tiling these levels together will reduce job time.
- Tiling cost is dominant above level 8. There is a risk of out of memory job failure errors when simultaneously binning these levels together due to the large number of tiles generated.
Therefore, our typical use case has:
binning.level.0=0-8 binning.level.1=9 binning.level.2=10
For details on testing the output of your tiling job, see the Testing Tiling Job Output topic.