# Create CSV telemetry instances

To create an instance of the CSV provider, build a YAML file (`instance.yml` in this example) with the definition of the instance:

```yaml
# CSV Telemetry Provider Instance
provider: CSV File
config:
  address: host1.example.com
  authType: password
  username: akamas
  auth: akamas
  remoteFilePattern: /monitoring/result-*.csv
  componentColumn: COMPONENT
  timestampColumn: TS
  timestampFormat: YYYY-MM-dd'T'HH:mm:ss
metrics:
  - metric: cpu_util
    datasourceMetric: user%
```

Then you can create the instance for the `system` using the Akamas CLI:

```bash
akamas create telemetry-instance instance.yml system
```

#### `timestampFormat` format

Notice that the week-year format `YYYY` is compliant with the ISO-8601 specification, but you should replace it with the year-of-era format `yyyy` if you are specifying a `timestampFormat` different from the ISO one. For example:

* Correct: `yyyy-MM-dd HH:mm:ss`
* Wrong: `YYYY-MM-dd HH:mm:ss`

You can find detailed information on timestamp patterns in the *Patterns for Formatting and Parsing* section on the [DateTimeFormatter (Java Platform SE 8)](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) page.

### Configuration options <a href="#configuration-options" id="configuration-options"></a>

When you create an instance of the CSV provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from your CSV files.

You can specify configuration information within the `config` part of the YAML of the instance definition.

#### Required properties <a href="#required-properties" id="required-properties"></a>

* `address` - a URL or IP identifying the address of the host where CSV files reside
* `username` - the username used when connecting to the host
* `authType` - the type of authentication to use when connecting to the file host; either `password` or `key`
* `auth` - the authentication credential; either a password or a key according to `authType`. When using keys, the value can either be the value of the key or the path of the file to import from
* `remoteFilePattern` - a list of remote files to be imported

#### Optional properties <a href="#optional-properties" id="optional-properties"></a>

* `protocol` - the protocol to use to retrieve files; either `scp` or `sftp`. Default is `scp`
* `fieldSeparator` - the character used as a field separator in the CSV files. Default is `,`
* `componentColumn` - the header of the column containing the name of the component. Default is `COMPONENT`
* `timestampColumn` - the header of the column containing the timestamp. Default is `TS`
* `timestampFormat` - the format of the timestamp (e.g. `yyyy-MM-dd HH:mm:ss zzz`). Default is `YYYY-MM-ddTHH:mm:ss`

You should also specify the mapping between the metrics available in your CSV files and those provided by Akamas. This can be done in the `metrics` section of the telemetry instance configuration. To map a custom metric you should specify at least the following properties:

* `metric` - the name of a metric in Akamas
* `datasourceMetric` - the header of a column that contains the metric in the CSV file

The provider ignores any column not present as `datasourceMetric` in this section.

The sample configuration reported in this section would import the metric `cpu_util` from CSV files formatted as in the example below:

```csv
TS,                   COMPONENT,  user%
2020-04-17T09:46:30,  host,       20
2020-04-17T09:46:35,  host,       23
2020-04-17T09:46:40,  host,       32
2020-04-17T09:46:45,  host,       21
```

### Telemetry instance reference <a href="#telemetry-instance-reference" id="telemetry-instance-reference"></a>

The following represents the complete configuration reference for the telemetry provider instance.

```yaml
provider: CSV File             # this is an instance of the CSV provider
config:
  address: host1.example.com   # the address of the host with the CSV files
  port: 22                     # the port used to connect
  authType: password           # the authentication method
  username: akamas             # the username used to connect
  auth: akamas                 # the authentication credential
  protocol: scp                # the protocol used to retrieve the file
  fieldSeparator: ","          # the character used as field separator in the CSV files
  remoteFilePattern: /monitoring/result-*.csv    # the path of the CSV files to import
  componentColumn: COMPONENT                     # the header of the column with component names
  timestampColumn: TS                            # the header of the column with the time stamp
  timestampFormat: YYYY-mm-ddTHH:MM:ss           # the format of the timestamp
metrics:
  - metric: cpu_util                             # the name of the Akamas metric
    datasourceMetric: user%                      # the header of the column with the original metric
    staticLabels:
      mode: user                                 # (optional) additional labels to add to the metric
```

The following table reports the configuration reference for the `config` section

<table data-full-width="false"><thead><tr><th>Field</th><th>Type</th><th>Description</th><th>Default Value</th><th>Restrictions</th><th>Required</th></tr></thead><tbody><tr><td><code>address</code></td><td>String</td><td>The address of the machine where the CSV file resides</td><td></td><td>A valid URL or IP</td><td><strong>Yes</strong></td></tr><tr><td><code>port</code></td><td>Number (integer)</td><td>The port to connect to, in order to retrieve the file</td><td>22</td><td>1≤<code>port</code>≤65536</td><td>No</td></tr><tr><td><code>username</code></td><td>String</td><td>The username to use in order to connect to the remote machine</td><td></td><td></td><td>Yes</td></tr><tr><td><code>protocol</code></td><td>String</td><td>The protocol used to connect to the remote machine: <a href="https://en.wikipedia.org/wiki/Secure_copy">SCP</a> or <a href="https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol">SFTP</a></td><td><code>scp</code></td><td><code>scp</code> <code>sftp</code></td><td>No</td></tr><tr><td><code>authType</code></td><td>String</td><td><p>Specify which method is used to authenticate against the remote machine:</p><ul><li>password: use the value of the parameter <code>auth</code> as a password</li><li>key: use the value of the parameter <code>auth</code> as a private key. Supported formats are RSA and DSA</li></ul></td><td></td><td><code>password</code> <code>key</code></td><td>Yes</td></tr><tr><td><code>auth</code></td><td>String</td><td>A password or an RSA/DSA key (as YAML multi-line string, keeping new lines)</td><td></td><td></td><td>Yes</td></tr><tr><td><code>remoteFilePattern</code></td><td>String</td><td>The path of the remote file(s) to be analyzed. The path can contains <a href="https://en.wikipedia.org/wiki/Glob_(programming)">GLOB</a> expressio</td><td></td><td>A list of valid path for linux</td><td>Yes</td></tr><tr><td><code>componentColumn</code></td><td>String</td><td><p>The CSV column containing the name of the component.</p><p>The column's values must match (case sensitive) the name of a component specified in the System</p></td><td><code>COMPONENT</code></td><td>The column must exists in the CSV file</td><td>Yes</td></tr><tr><td><code>timestampColumn</code></td><td>String</td><td>The CSV column containing the timestamps of the samples</td><td><code>TS</code></td><td>The column must exists in the CSV file</td><td>No</td></tr><tr><td><code>timestampFormat</code></td><td>String</td><td>Timestamps' format</td><td><code>YYYY-mm-ddTHH:MM:ss</code></td><td>Must be specified using <a href="https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">Java syntax</a>.</td><td>No</td></tr><tr><td><code>fieldSeparator</code></td><td>String</td><td>Specify the field separator of the CSV</td><td><code>,</code></td><td><code>,</code> <code>;</code></td><td>No</td></tr></tbody></table>

The following table reports the configuration reference for the `metrics` section

<table data-full-width="false"><thead><tr><th>Field</th><th>Type</th><th>Description</th><th>Restrictions</th><th>Required</th></tr></thead><tbody><tr><td><code>metric</code></td><td>String</td><td>The name of the metric in Akamas</td><td>An existing Akamas metric</td><td>Yes</td></tr><tr><td><code>datasourceMetric</code></td><td>String</td><td>The name (header) of the column that contains the specific metric</td><td>An existing column in the CSV file</td><td>Yes</td></tr><tr><td><code>scale</code></td><td>Decimal number</td><td>The scale factor to apply when importing the metric</td><td></td><td></td></tr><tr><td><code>staticLabels</code></td><td>List of key-value pairs</td><td>A list of key-value pairs that will be attached to the specific metric sample</td><td></td><td>No</td></tr></tbody></table>

## Use cases <a href="#hardbreak-use-cases" id="hardbreak-use-cases"></a>

Here you can find common use cases addressed by this provider.

#### Linux SAR <a href="#linux-sar" id="linux-sar"></a>

In this use case, you are going to import some metrics coming from [SAR](https://en.wikipedia.org/wiki/Sar_\(Unix\)), a popular UNIX tool to monitor system resources. SAR can export CSV files in the following format.

```csv
hostname, interval,     timestamp, 		        %user,	%system,      %memory
machine1, 600,		2018-08-07 06:45:01 UTC,	30.01,	20.77,		96.21
machine1, 600,		2018-08-07 06:55:01 UTC,	40.07,	13.00,		84.55
machine1, 600,		2018-08-07 07:05:01 UTC,	5.00,	90.55,		89.23
```

Note that the metrics are percentages (between 1 and 100), while Akamas accepts percentages as values between 0 and 1, therefore each metric in this configuration has a scale factor of 0.001.

You can import the two CPU metrics and the memory metric from a SAR log using the following telemetry instance configuration.

```yaml
provider: CSV File
config:
  remoteFilePattern: /csv/sar.csv
  address: 127.0.0.1
  port: 22
  username: user123
  auth: password123
  authType: password
  protocol: scp
  componentColumn: hostname
  timestampColumn: timestamp
  timestampFormat: yyyy-MM-dd HH:mm:ss zzz
metrics:
  - metric: cpu_util
    datasourceMetric: %user
    scale: 0.001
    staticLabels:
      mode: user
  - metric: cpu_util
    datasourceMetric: %system
    scale: 0.001
    staticLabels:
      mode: system
  - metric: mem_util
    scale: 0.001
    datasourceMetric: %memory
```

Using the configured instance, the CSV File provider will perform the following operations to import the metrics:

1. Retrieve the file "/csv/sar.csv" from the server "127.0.0.1" using the SCP protocol authenticating with the provided password.
2. Use the column `hostname` to lookup components by name.
3. Use the column `timestamp` to find the timestamps of the samples (that are expected to be in the format specified by `timestampFormat`).
4. Collect the metrics (two with the same name, but different labels, and one with a different name):
   * `cpu_util`: in the CSV file is in the column *%user* and attach to its samples the label "mode" with value "user".
   * `cpu_util`: in the CSV file is in the column *%system* and attach to its samples the label "mode" with value "system".
   * `mem_util`: in the CSV file is in the column *%memory.*
