# Create CSV telemetry instances

To create an instance of the CSV provider, build a YAML file (`instance.yml` in this example) with the definition of the instance:

{% code lineNumbers="true" %}

```yaml
# CSV Telemetry Provider Instance
provider: CSV File
config:
 address: host1.example.com
 authType: password
 username: akamas
 auth: akamas
 remoteFilePattern: /monitoring/result-*.csv
 componentColumn: COMPONENT
 timestampColumn: TS
 timestampFormat: YYYY-MM-dd'T'HH:mm:ss
metrics:
 - metric: cpu_util
   datasourceMetric: user%
```

{% endcode %}

Then you can create the instance for the `system` using the Akamas CLI:

```bash
akamas create telemetry-instance instance.yml system
```

#### `timestampFormat` format

Regarding the timestamp format, please notice that while the week-year format `YYYY` is compliant with the ISO-8601 specification, but you should replace it with the year-of-era format `yyyy` if you are specifying a `timestampFormat` different from the ISO one. For example:

* Correct: `yyyy-MM-dd HH:mm:ss`
* Wrong: `YYYY-MM-dd HH:mm:ss`

You can find detailed information on timestamp patterns in the *Patterns for Formatting and Parsing* section on the [DateTimeFormatter (Java Platform SE 8)](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) page.

### Configuration options <a href="#configuration-options" id="configuration-options"></a>

When you create an instance of the CSV provider, you should specify some configuration information to allow the provider to correctly extract and process metrics from your CSV files.

You can specify configuration information within the `config` part of the YAML of the instance definition.

#### Required properties <a href="#required-properties" id="required-properties"></a>

* `address` - a URL or IP identifying the address of the host where CSV files reside
* `username` - the username used when connecting to the host
* `authType` - the type of authentication to use when connecting to the file host; either `password` or `key`
* `auth` - the authentication credential; either a password or a key according to `authType`. When using keys, the value can either be the value of the key or the path of the file to import from
* `remoteFilePattern` - a list of remote files to be imported

#### Optional properties <a href="#optional-properties" id="optional-properties"></a>

* `protocol` - the protocol to use to retrieve files; either `scp` or `sftp`. Default is `scp`
* `fieldSeparator` - the character used as a field separator in the csv files. Default is `,`
* `componentColumn` - the header of the column containing the name of the component. Default is `COMPONENT`
* `timestampColumn` - the header of the column containing the timestamp. Default is `TS`
* `timestampFormat` - the format of the timestamp (e.g. `yyyy-MM-dd HH:mm:ss zzz`). Default is `YYYY-MM-ddTHH:mm:ss`

You should also specify the mapping between the metrics available in your CSV files and those provided by Akamas. This can be done in the `metrics` section of the telemetry instance configuration. To map a custom metric you should specify at least the following properties:

* `metric` - the name of a metric in Akamas
* `datasourceMetric` - the header of a column that contains the metric in the CSV file

The provider ignores any column not present as `datasourceMetric` in this section.

The sample configuration reported in this section would import the metric `cpu_util` from CSV files formatted as in the example below:

{% code lineNumbers="true" %}

```csv
TS,                   COMPONENT,  user%
2020-04-17T09:46:30,  host,       20
2020-04-17T09:46:35,  host,       23
2020-04-17T09:46:40,  host,       32
2020-04-17T09:46:45,  host,       21
```

{% endcode %}

### Telemetry instance reference <a href="#telemetry-instance-reference" id="telemetry-instance-reference"></a>

The following represents the complete configuration reference for the telemetry provider instance.

{% code lineNumbers="true" %}

```yaml
provider: CSV File            # this is an instance of the Csv provider
config:
 address: host1.example.com   # the adress of the host with the csv files
 port: 22                     # the port used to connect
 authType: password           # the authentication method
 username: akamas             # the username used to connect
 auth: akamas                 # the authentication credential
 protocol: scp                # the protocol used to retrieve the file
 fieldSeparator: ","          # the character used as field separator in the csv files
 remoteFilePattern: /monitoring/result-*.csv    # the path of the csv files to import
 componentColumn: COMPONENT                     # the header of the column with component names
 timestampColumn: TS                            # the header of the column with the time stamp
 timestampFormat: YYYY-mm-ddTHH:MM:ss         # the format of the timestamp
metrics:
 - metric: cpu_util                             # the name of the Akamas metric
   datasourceMetric: user%                      # the header of the column with the original metric
   staticLabels:
    mode: user                                  # (optional) additional labels to add to the metric
```

{% endcode %}

The following table reports the configuration reference for the `config` section

<table data-full-width="false"><thead><tr><th>Field</th><th>Type</th><th>Description</th><th>Default Value</th><th>Restrictions</th><th>Required</th></tr></thead><tbody><tr><td><code>address</code></td><td>String</td><td>The address of the machine where the CSV file resides</td><td></td><td>A valid URL or IP</td><td><strong>Yes</strong></td></tr><tr><td><code>port</code></td><td>Number (integer)</td><td>The port to connect to, in order to retrieve the file</td><td>22</td><td>1≤<code>port</code>≤65536</td><td>No</td></tr><tr><td><code>username</code></td><td>String</td><td>The username to use in order to connect to the remote machine</td><td></td><td></td><td>Yes</td></tr><tr><td><code>protocol</code></td><td>String</td><td>The protocol used to connect to the remote machine: <a href="https://en.wikipedia.org/wiki/Secure_copy">SCP</a> or <a href="https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol">SFTP</a></td><td><code>scp</code></td><td><code>scp</code> <code>sftp</code></td><td>No</td></tr><tr><td><code>authType</code></td><td>String</td><td><p>Specify which method is used to authenticate against the remote machine:</p><ul><li>password: use the value of the parameter <code>auth</code> as a password</li><li>key: use the value of the parameter <code>auth</code> as a private key. Supported formats are RSA and DSA</li></ul></td><td></td><td><code>password</code> <code>key</code></td><td>Yes</td></tr><tr><td><code>auth</code></td><td>String</td><td>A password or an RSA/DSA key (as YAML multi-line string, keeping new lines)</td><td></td><td></td><td>Yes</td></tr><tr><td><code>remoteFilePattern</code></td><td>String</td><td>The path of the remote file(s) to be analyzed. The path can contains <a href="https://en.wikipedia.org/wiki/Glob_(programming)">GLOB</a> expressio</td><td></td><td>A list of valid path for linux</td><td>Yes</td></tr><tr><td><code>componentColumn</code></td><td>String</td><td><p>The CSV column containing the name of the component.</p><p>The column's values must match (case sensitive) the name of a component specified in the System</p></td><td><code>COMPONENT</code></td><td>The column must exists in the CSV file</td><td>Yes</td></tr><tr><td><code>timestampColumn</code></td><td>String</td><td>The CSV column containing the timestamps of the samples</td><td><code>TS</code></td><td>The column must exists in the CSV file</td><td>No</td></tr><tr><td><code>timestampFormat</code></td><td>String</td><td>Timestamps' format</td><td><code>YYYY-mm-ddTHH:MM:ss</code></td><td>Must be specified using <a href="https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html">Java syntax</a>.</td><td>No</td></tr><tr><td><code>fieldSeparator</code></td><td>String</td><td>Specify the field separator of the CSV</td><td><code>,</code></td><td><code>,</code> <code>;</code></td><td>No</td></tr></tbody></table>

The following table reports the configuration reference for the `metrics` section

<table data-full-width="false"><thead><tr><th>Field</th><th>Type</th><th>Description</th><th>Restrictions</th><th>Required</th></tr></thead><tbody><tr><td><code>metric</code></td><td>String</td><td>The name of the metric in Akamas</td><td>An existing Akamas metric</td><td>Yes</td></tr><tr><td><code>datasourceMetric</code></td><td>String</td><td>The name (header) of the column that contains the specific metric</td><td>An existing column in the CSV file</td><td>Yes</td></tr><tr><td><code>scale</code></td><td>Decimal number</td><td>The scale factor to apply when importing the metric</td><td></td><td></td></tr><tr><td><code>staticLabels</code></td><td>List of key-value pairs</td><td>A list of key-value pairs that will be attached to the specific metric sample</td><td></td><td>No</td></tr></tbody></table>

## Use cases <a href="#hardbreak-use-cases" id="hardbreak-use-cases"></a>

Here you can find common use cases addressed by this provider.

#### Linux SAR <a href="#linux-sar" id="linux-sar"></a>

In this use case, you are going to import some metrics coming from [SAR](https://en.wikipedia.org/wiki/Sar_\(Unix\)), a popular UNIX tool to monitor system resources. SAR can export CSV files in the following format.

{% code lineNumbers="true" %}

```csv
hostname, interval,     timestamp, 		        %user,	%system,      %memory
machine1, 600,		2018-08-07 06:45:01 UTC,	30.01,	20.77,		96.21
machine1, 600,		2018-08-07 06:55:01 UTC,	40.07,	13.00,		84.55
machine1, 600,		2018-08-07 07:05:01 UTC,	5.00,	90.55,		89.23
```

{% endcode %}

Note that the metrics are percentages (between 1 and 100), while Akamas accepts percentages as values between 0 and 1, therefore each metric in this configuration has a scale factor of 0.001.

You can import the two CPU metrics and the memory metric from a SAR log using the following telemetry instance configuration.

{% code lineNumbers="true" %}

```yaml
provider: CSV File
config:
  remoteFilePattern: /csv/sar.csv
  address: 127.0.0.1
  port: 22
  username: user123
  auth: password123
  authType: password
  protocol: scp
  componentColumn: hostname
  timestampColumn: timestamp
  timestampFormat: yyyy-MM-dd HH:mm:ss zzz
metrics:
- metric: cpu_util
  datasourceMetric: %user
  scale: 0.001
  staticLabels:
    mode: user
- metric: cpu_util
  datasourceMetric: %system
  scale: 0.001
  staticLabels:
    mode: system
- metric: mem_util
  scale: 0.001
  datasourceMetric: %memory
```

{% endcode %}

Using the configured instance, the CSV File provider will perform the following operations to import the metrics:

1. Retrieve the file "/csv/sar.csv" from the server "127.0.0.1" using the SCP protocol authenticating with the provided password.
2. Use the column `hostname` to lookup components by name.
3. Use the column `timestamp` to find the timestamps of the samples (that is expected to be in the format specified by `timestampFormat`).
4. Collect the metrics (two with the same name, but different labels, and one with a different name):
   * `cpu_util`: in the CSV file is in the column *%user* and attach to its samples the label "mode" with value "user".
   * `cpu_util`: in the CSV file is in the column *%system* and attach to its samples the label "mode" with value "system".
   * `mem_util`: in the CSV file is in the column *%memory.*
