update docs

commit: 82f026de1e53a0dcca9fc96e341197449647518f [log] [tgz]
author: kdanylov aka koder <koder.mail@gmail.com> Mon Jun 26 12:45:17 2017 +0300
committer: kdanylov aka koder <koder.mail@gmail.com> Mon Jun 26 12:45:17 2017 +0300
tree: 5ad26ce3372d9988749caa6fdbb177c1f6b6ffd7
parent: 7e39b3e290d26f19e063003ee659cdd401d7adb6 [diff]
diff --git a/Docs/index.md b/Docs/index.md
index 9c6c685..c92e247 100644
--- a/Docs/index.md
+++ b/Docs/index.md

@@ -1,5 +1,12 @@
-Introduction
-------------
+How to read this document
+=========================
+
+* For fast start go directly to **Howto install wally** section and then to appropriate subsection of 
+  **Howto test** section
+
+
+Overview
+========
 
 Wally is a tool to measure performance of block storages of different kinds
 in distributed ways and provide comprehensive reports. It's designed to
@@ -75,15 +82,37 @@
 * Generating report
 * Cleanup
 
-How to run a test
------------------
+Wally motivation
+----------------
+
+Major testing problems and how wally fix them for you
+
+Howto install wally
+===================
+
+Container
+---------
+
+
+Local installation
+------------------
+
+apt install g++ ....
+
+pip install XXX
+python -m wally prepare << download fio, compile
+
+
+Howto run a test
+================
 
 To run a test you need to prepare cluster and config file.
 
+How to run wally: using container, directly
 
-Configuration file
-==================
 
+Configuration
+=============
 
 * `SSHURI` - string in format [user[:passwd]@]host[:port][:key_file]
     Where:
@@ -94,21 +123,27 @@
     - `key_file` - str, path to ssh key private file. `~/.ssh/id_rsa` by default. In case if `port` is ommited,
        but `key_file` is provided - it must be separated from host with two colums.
     
-    `passwd` and `key_file` must not be used at the same time. Examples:
+    `passwd` and `key_file` must not be used at the same time.
     
-    - root@master
-    - root@127.0.0.1:44
-    - root@10.20.20.30::/tmp/keyfile
-    - root:rootpasswd@10.20.30.40
+    Examples:
+    
+    - `11.12.23.10:37` - ip and ssh port, current user and `~/.ssh/id_rsa` key used
+    - `ceph-1` - only host name, default port, current user and `~/.ssh/id_rsa` key used
+    - `ceph-12::~/.ssh/keyfile` - current user and `~/.ssh/keyfile` key used
+    - `root@master` - login as root with `~/.ssh/id_rsa` key
+    - `root@127.0.0.1:44` - login as root, using 44 port and key from `~/.ssh/id_rsa`
+    - `user@10.20.20.30::/tmp/keyfile` - login as root using key from `/tmp/keyfile` 
+    - `root:rootpassword@10.20.30.40` - login as root using `rootpassword` as an ssh password 
     
 * `[XXX]` - list of XXX type
 * `{XXX: YYY}` - mapping from type XXX(key) to type YYY(value)
-    
+* `SIZE` - integer number with one of usual K/M/G/T/P suffixes, or without. Be aware that 1024 base is used,
+    for 10M really mean 10MiB == 10485760 Bytes and so on.
 
-default settings
+Default settings
 ----------------
 
-Many config settings already has usable default values in config-examples/default.yaml file and
+Many config settings already has usable default values in `config-examples/default.yaml` file and
 in most cases use can reuse them. For those put next include line in you config file:
 
 `include: default.yaml`
@@ -141,7 +176,7 @@
 
     Example: `results_storage: /var/wally`
 
-* `sleep`: int
+* `sleep`: int, default is zero
 
     Tell wally to do nothing for X seconds. Useful if you only need to collect sensors.
     
@@ -171,34 +206,446 @@
 
 * `ip_remap`: {IP: IP}
     
-    Used in case if OSD and Monitor nodes registered in ceph using internal ip addresses, which is not visible from
-    master node.  Allows to map non-routable ip addresses to routable. Example:
+    Use in case if OSD and Monitor nodes registered in ceph via internal ip addresses, which is not visible from
+    node,where you run wally.  Allows to map non-routable ip addresses to routable. Example:
 
-    ```
+```yaml
     ip_remap:
         10.8.0.4: 172.16.164.71
         10.8.0.3: 172.16.164.72
         10.8.0.2: 172.16.164.73
-    ```
+```
+
+Example:
+
+```yaml
+ceph:
+    root_node: ceph-client
+    cluster: ceph    # << optional
+    ip_remap:        # << optional
+        10.8.0.4: 172.16.164.71
+        10.8.0.3: 172.16.164.72
+        10.8.0.2: 172.16.164.73
+```
+
+Section `openstack`
+-------------------
+
+Provides openstack settings, used to discover OS cluster and to spawn/find test vm.
+
+* `skip_preparation`: bool
+
+    Default: `true`, wally need prepared openstack to spawn virtual machines. If you OS cluster was prepared
+    previously you can set this setting to `false` to save some time on checks.
+    
+* `openrc`: either str ir {str: str}
+    
+    Specify source for [openstack connection settings]. 
+    
+    - `openrc: ENV` - get OS credentials from environment variables. You need to export openrc setting
+       before start wally, like this
+       ```
+       $ source openrc
+       $ RUN_WALLY
+       ```
+       
+       or
+       
+       ```
+       $ env OS_USER=.. OS_PROJECT=..  RUN_WALLY
+       ```
+     
+    - `openrc: str` - use openrc file, located at provided path to get OS connection settings. Example:
+    `openrc: /tmp/my_cluster_openrc`
+    
+    - `openrc: {str: str}` - provide connection settings directly in config file.
+
+        Example:
+
+```yaml
+        openrc:
+            OS_USERNAME: USER
+            OS_PASSWORD: PASSWD
+            OS_TENANT_NAME: KEY_FILE
+            OS_AUTH_URL: URL
+```
+
+* `insecure`: bool - override OS_INSECURE settings, provided in `openrc` section.
+
+* `vms`: [SSHURI]
+    List of vm sshuri, except that instead of hostname/ip vm name prefix is used. Wally will found all vm,
+    which has a name with this prefix and use them as test nodes.
+    
+    Example:
+
+```yaml
+        vms:
+            - wally@wally_vm
+            - root:rootpasswd@test_vm
+```
+
+This will found all vm named like `wally_vm*` and `test_vm` and try to reuse them for test with provided credentials.
+Note that by default for vm wally use openstack ssh key, not `~/.ssh/id_rsa`. See **Openstack vm config** section
+for details.
+
+* VM spawning options. This options control how many new vm to spawn for test and what profile to use.
+  All spawned vm would automatically get `testnode` role and would be used for tests.
+  Wally try to spaw vm evenly across all compute nodes, using anti-affinity groups.
+
+   - `count`: str or int. Control how many vm to spawn, possible values:
+       - `=X`, where X is int - spawn as many vm as needed to make total testnodes count not less that X.
+         As example - if you already have 1 explicit test node, provided via `nodes` anso wally found 2 vm's left
+         from previous test run and you set `count: =4` so wally will spawn one additional vm.
+       - `X`, where X is integer. Spawn exactly X new vm.
+       - `xX`, where X is integer. Spawn X vm per compute.
+        Example: `copunt: x3` - spawn 3 vm per each compute.       
+   
+   - `cfg_name`: str, vm config. By default only `wally_1024` config are available. This config uses image from
+     `https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img` as vm image,
+     1GiB of ram, 2 vCPU and 100GiB volume. See **Openstack vm config** for details.
+   
+   - `network_zone_name`: str. Network pool for internal ip v4. Usually `net04`
+   - `flt_ip_pool`: str. Network pool for floating ip v4. Usually `net04_ext`
+   - `skip_preparation`: bool, false by default. By default before spawn vm wally check that all required prerequisites,
+      like vm flavour, image, aa-groups, ssh rules are ready and create them is something is missed. This tell
+      wally to skip this stage. You may set it if you sure, that openstack is prepared and like to save some time
+      on this stage, but better to keep it off to prevent issues.
+
 
 Section `nodes`
 ---------------
 
-{SSHURI: [str]} - contains mapping of sshuri to list of roles for selected node. Helps wally in case if it can't
-detect cluster nodes. Also all testnodes are provided via this section and at least one node with role testnode
-must be provided. Example:
+{SSHURI: [str]} - mapping of `sshuri` to list of roles for selected node. Helps wally in case if it can't
+detect cluster nodes. Also all testnodes are provided via this section, except for reused VM.
 
-```
+Example:
+
+```yaml
 nodes:
     user@ceph-client: testnode
 ```
 
+*Note: you need to define at least one node with `testnode` role here, unless you reuse VM in `openstack` section*
+
 Section `tests`
 ---------------
+    
+This section define list of test suites to be executed. Each section is a map from suite type to suite config.
+See details for different suites below.
+
+fio suite config
+----------------
+* `load`: str - required option, name of load profile. 
+
+   By default next profiles are available:
+
+   - `ceph` - for all kind of ceph-backed block devices
+   - `hdd` - local hdd drive
+   - `cinder_iscsi` - cinder lvm-over-iscsi volumes
+   - `check_distribution` - check how IOPS/latency are distributed
+   
+   See **fio task files** section for details.
+
+* `params`: {str: Any} - list of parameters for load profile.
+    Subparams:
+    - `FILENAME`: str, required by all profiles. It will be used as test file for fio.
+        In case if test file name is different on different test nodes you need to create (sym)links with same
+        names on all them before start test and use link name here.
+    - `FILESIZE`: SIZE, file size parameter. While in most cases wally can correctly detect device/file size
+        often you don't need to test whole file. Also this parameter is required if file doesn't exists yet.
+
+    Non-standard loads may need some additional parameters, see **fio task files** section for details.
+  
+* `use_system_fio`: bool, false by default. Tell `wally` to use testnode local fio binary, instead of one shipped
+   with wally. You might need this in case if wally has no prebuild fio for you distribution. By default it's
+   better to use wally's fio, as ones with distribution is often outdated. See
+   **HOWTO**/`Supply fio for you distribution` for details.
+
+* `use_sudo`: bool, false by default. Wally will run fio on testnodes with sudo. Often this requires if you local
+  testnode user is not root, but you need to test device.
+
+* `force_prefill`: bool. false by default. Tell wally to unconditionally fill test file/device with pseudo-random data
+   before test. By default wally first check that target is already contains random data and skip filling step.
+   On this step wally fill entire device, so it might takes a long.
+
+* `skip_prefill`: bool, false by default. Force wally to don't fill target with pseudorandom data. Use this if you
+   are testing local hdd/ssd/cinder iscsi, but not if you testing ceph backed device or any device, which 
+   backed by system with delayed space allocation.
+
+Example:
+
+```yaml
+  - fio:
+      load: ceph
+      params:
+          FILENAME: /dev/vdb
+          FILESIZE: 100G
+```
+
+Key `test_profile`: str
+-----------------------
+
+This section allows to use some predefined set of settings for spawning VM and run tests.
+Available profiles with they settings are listed in config-examples/default.yaml file.
+Next profiles are available by default:
+
+* `openstack_ceph` - spawn 1 VM per each compute and run `ceph` fio profile against /dev/vdb
+* `openstack_cinder` - spawn 1 VM per each compute and run `ceph_iscsi_vdb` fio profile against /dev/vdb
+* `openstack_nova` - spawn 1 VM per each compute and run `hdd` fio profile against /opt/test.bin
+
+Example:
+
+```yaml
+include: default.yaml
+discover: openstack,ceph
+run_sensors: true
+results_storage: /var/wally_results
+
+ceph:
+    root_node: localhost
+
+openstack:
+    openrc: ENV  # take creds from environment variable
+
+test_profile: openstack_ceph
+```
+
+CLI
+===
+.....
 
 
+Test suites description and motivation
+======================================
+
+Other useful information
+========================
 
 fio task files
 --------------
 
+Openstack vm config
+-------------------
 
+image/flavour/ssh keys, etc
+
+Howto test
+==========
+
+Local block device
+------------------
+Use `config-examples/local_block_device.yml` as a template. Replace `{STORAGE_FOLDER}` with path to folder where put
+result directory. Make sure, that wally has read/write access to this folder, or can create it. You can either test
+device directly,
+or test a file on already mounted device. Replace `{STORAGE_DEV_OR_FILE_NAME}` with correct path. In most cases wally
+can detect file or block device size correctly, but usually better to set `{STORAGE_OR_FILE_SIZE}` directly. The larger
+file you will use, the less affect on result will cause different caches, but also longer would be initial filling it
+with pseudo-random data.
+
+Example of testings `sdb` device:
+
+```yaml
+include: default.yaml
+run_sensors: false
+results_storage: /var/wally
+
+nodes:
+    localhost: testnode
+
+tests:
+  - fio:
+      load: hdd
+      params:
+          FILENAME: /dev/sdb
+          FILESIZE: 100G
+```
+
+Example of testings device, mounted to `/opt` folder:
+
+```yaml
+include: default.yaml
+run_sensors: false
+results_storage: /var/wally
+
+nodes:
+    localhost: testnode
+
+tests:
+  - fio:
+      load: hdd
+      params:
+          FILENAME: /opt/some_test_file.bin
+          FILESIZE: 100G
+```
+
+**Be aware, that wally will not remove file after test complete.**
+
+Ceph without openstack, or other NAS/SAN
+----------------------------------------
+
+Wally support only rbd/cephfs testing, object protocols, such as rados and RGW is not supported.
+Cephfs testing doesn't requires any special preparation except usual mounting it on test nodes, consult
+[ceph fs quick start] for details.
+
+Ceph linear read/write is usually limited by network. As example if you have 10 SATA drives used as storage drives in
+you cluster than aggregated linear read speed can reach ~1Gibps or 8Gibps, which is close to 10Gib network limitation.
+So unless you have a test node with wide enough network it's usually better to test ceph cluster from several test
+nodes in parallel.
+
+Ceph has generally low performance on low QD as in this mode you work with only one OSD at a time.
+Meanwhile ceph can scale to much larger QD values than hdd/ssd drives, as in this case you spread IO requests
+across all OSD daemons. You need up to (16 * OSD_count) QD for 4k random  reads and about
+(12 * OSD_COUNT / REPLICATION_FACTOR) QD for 4k random writes to touch cluster limitations.
+For other blocks and modes you might need different settings. You don't need to care about this, if you are using
+default `ceph` profile.
+
+There are three ways of testing RBD - direct, by mounting it to node using [krbd] and via virtual machine,
+with volume provided by rbd driver, built into qemu. For the last one consult **Ceph with openstack** section
+or documentation to you hypervisor.
+
+**TODO: add example**
+
+Using testnode mounted rbd device
+---------------------------------
+
+First you need a pool to be used as target for rbd. You can use default `rbd` pool, or create you own for test.
+Pool need to have many PG to have good performance. Ballpark estimation is (100 * OSD_COUNT / REPLICATION_FACTOR).
+After creation ceph may warn about "too many PG", this message can be safely ignored. Here is ceph documentation:
+[PLACEMENT GROUPS].
+
+* Create a pool (consult [ceph pools documentation] for details).
+```bash
+    $ ceph osd pool create {pool-name} {pg-num} 
+```
+
+* Wait till crestion complete and all PG became `active+clean`. 
+* Create rbd volume in this pool, volume size need to be selected large enough to mitigate unavoidable OSD nodes
+  FS caches. Usually (SUM_RAM_SIZE_ON_ALL_OSD * 3) works good and results in only ~20% cache hit on reads:
+
+```bash
+    $ rbd create {vol-name} --size {size} --pool {pool-name}
+```
+
+* Mount rbd via kernel rbd device. This is a tricky part. Kernels usually has old version of rbd driver and doesn't
+  support newest rbd features. This will results in errors during mounting rbd. First try to mount rbd device:
+  
+```bash
+    $ rbd map {vol-name} --pool {pool-name}
+```
+
+If it fails - you need to run `rbd info --pool {pool-name} {vol-name}`, and disable features via
+`rbd feature disable --pool {pool-name} {vol-name} {feature name}`. Then try to mount once again.
+
+* wally need to have read/write access to result rbd device.
+
+Direct rbd testing
+------------------
+Direct rbd testing run via rbd driver, built inside fio. Using this driver fio can generate
+requests to RBD directly, without external rbd driver. This is the fastest and the most reliable way
+of testing RBD, but with internal rbd driver you bypassing some code layers, which cen be used in production
+environment. fio version shipped with wally has no rbd support, as it can't be build statically. In order to use
+it you need to build fio with rbd support, see **Use you fio binary** part of **Howto** section for instruction.
+
+**TODO**
+
+Ceph with openstack
+-------------------
+
+The easiest way is to use predefined `openstack_ceph` profile. It spawn one VM per each compute node and run `ceph`
+test suite on all of them.
+
+Example:
+
+```yaml
+include: default.yaml
+discover: openstack,ceph
+run_sensors: true
+results_storage: /var/wally_results
+
+ceph:
+    root_node: localhost
+
+openstack:
+    openrc: ENV  # take creds from environment variable
+
+test_profile: openstack_ceph
+```
+
+Cinder lvm volumes
+------------------
+
+Howto
+=====
+
+* Use you fio binary
+
+  You need to download fio source, compile it for linux distribution on test nodes, compress with bz2, name
+  `fio_{DISTRNAME}_{ARCH}.bz2` and put into `fio_binaries` folder. `ARCH` is output of `arch` command on target system.
+  `DISTRNAME` should be same as `lsb_release -c -s` output. 
+  
+  Here is a tupical steps to compile latest fio from master:
+  
+```bash
+    $ git clone 
+    $ cd fio
+    $ ./configure --build-static 
+    $ make -jXXX  # Replace XXX with you CPU core count to decrease compilation time
+    $ bzip2 fio
+    $ mv fio.bz2 WALLY_FOLDER/fio_binaries/fio_DISTRO_ARCH.bz2
+```
+
+Storage structure
+=================
+
+Wally save all input configurations, all collected data and test results into single subfolder of `results_storage`
+settings directory. All files are either csv(results/sensor files), yaml/js for configuration and non-numeric
+information, png/svg for images and couple of raw text files like logs and some outputs.
+
+Here is a description what each file contains:
+
+* `cli` - txt, wally cli in semi-raw formal
+* `config.yaml` - yaml, full final config, build from original wally config, passed as cli parameter by processing all
+   replacement and inclusions.
+* `log` - txt, wally execution log. Merged log of all wally runs for this test including restarts and report
+   generations.
+* `result_code` - yaml, contains exit code of last wally execution with 'test' subcommand on this folder.
+* `run_interval` - yaml, list of [begin_time, end_time] of last wally execution with 'test' subcommand on this folder.
+* `meta` - folder. Contains cached values for statistical calculations.
+* `nodes` - folder, information about test cluster
+    - `all.yml` - yaml. Contains information for all nodes, except for node parameters
+    - `nodes/parameters.js` - js. Contains node parameters. Parameters are stores separatelly, as they can be very large
+       for ceph nodes and js files parsed much faster in python than yaml.
+* `report` - folder, contains report html/css files and all report images. Can be copied to other place.
+    - `index.html` - report start page.
+    - `main.css` - report css file.
+    - `XXX/YYY.png or .svg` - image files for report
+* `results` - folder with all fio results
+    - `fio_{SUITE_NAME}_{IDX1}.yml` - yaml, full config for each executed suite.
+    - `fio_{SUITE_NAME}_{IDX1}.{JOB_SHORT_DESCR}_{IDX2}` - folder with all data for each job in suite
+        * `{NODE_IP}:{NODE_SSH_PORT}.fio.{TS}.(csv|json)` - fio output file. TS is parsed timeseries name - either 
+          `bw` or `lat` or `stdout` for output.
+        * `info.yml` - 
+
+Development docs
+================
+
+Source code structure
+---------------------
+
+Source code style
+-----------------
+
+Tests
+-----
+
+v2 changes
+==========
+....
+
+wishful thinking about v3
+-------------------------
+
+
+[ceph fs quick start]: http://docs.ceph.com/docs/master/start/quick-cephfs/
+[PLACEMENT GROUPS]: http://docs.ceph.com/docs/master/rados/operations/placement-groups/
+[ceph pools documentation]: http://docs.ceph.com/docs/kraken/rados/operations/pools/
commit	82f026de1e53a0dcca9fc96e341197449647518f	[log] [tgz]
author	kdanylov aka koder <koder.mail@gmail.com>	Mon Jun 26 12:45:17 2017 +0300
committer	kdanylov aka koder <koder.mail@gmail.com>	Mon Jun 26 12:45:17 2017 +0300
tree	5ad26ce3372d9988749caa6fdbb177c1f6b6ffd7
parent	7e39b3e290d26f19e063003ee659cdd401d7adb6 [diff]