TEST wrapper

This is very simple tests wrapper, with which you can run many tests compare results, draw plots, calculate metrics.

Nothing fancy but does the work …

Use it mainly in your python shell.

You can also use DataTest as example of how to initialize and connect Encoder and TM.


%matplotlib inline

import sys
sys.path.append('../lib')
sys.path.append('../lib/test')
import numpy as np
from utils import *
from data_test import *
DataSet.list_datasets()
['hot-gym', 'ny', 'sine', 'dollar']
dt = DataTest(data_set='ny')
dt.new_test(name='5x300', data_size=300, nrows=5)
sc> building classification map ...
dt.run_test(name='5x300', begin=0, end=1000)
Adding data metrics with name : 5x300 
dt.results.metrics
{'5x300': {'avg_fill': 0.027104,
  'fill': 0.027104,
  'mae': 2506.9769999999999,
  'mape': 0.17050835420385166,
  'max-fill': 0.094,
  'nll': 1.495726981091561,
  'r2': 0.5512082101100626,
  'rmse': 4477.3462467180261}}

!! NLL (negactive log likelihood score uses 1000 bins for calulation by default)

dt.tests['5x300'].plot_data()
../../_images/TM_test_8_0.png
dt.tests['5x300'].tm.memory.show_mem()
../../_images/TM_test_9_0.png
dt.tests.keys()
['5x300']
dt.results.series['5x300']['ys'][100:110]
array([8416, 7098, 5826, 4383, 3270, 2948, 3146, 3077, 3000, 4592])
dt.results.series['5x300']['yhat'][100:110]
array([ 8000,  6150, 11000,  4750,  6550, 17250,  4350,  2450,  2600,  2600])
dt.results.tests_list()
{'5x300': {'metrics': {'avg_fill': 0.027104,
   'fill': 0.027104,
   'mae': 2506.9769999999999,
   'mape': 0.17050835420385166,
   'max-fill': 0.094,
   'nll': 1.495726981091561,
   'r2': 0.5512082101100626,
   'rmse': 4477.3462467180261},
  'size': 1000}}

dt.save() - saves all the objects tests, results, tm,classifier, encoder and data … later any one of those can be restored separately and ClassXX.load()


On the fly

You can also generate the data set on the fly, just use as first character dot and then PatternLang() spec.

dt = DataTest(data_set=".(12,(5:1000:22))*20")
dt.new_test(name='5x300', nrows=5, data_size=300)
dt.data.data.size
940
dt.run_test(name='5x300',begin=0,end=940)
Adding data metrics with name : 5x300 
dt.first.plot_data()
../../_images/TM_test_20_0.png

You can also see how “jumpines” shows up if don’t have enough granularity. f.e. change 1000 => 100 in the pattern i.e. *(12,(5:100:22))20

Default granularity for PatternLang pattern is 10.


Scalar classifier

Scalar classifier uses ScalarEncoder to generate a list of bitstrings. Later when you call best_match() it will find the closest value (by overlap).

from scalar_classifier import *
from scalar_encoder import *
se = ScalarEncoder(maximum=1000,nbits=50)
sc = ScalarClassifier(encoder=se)
se.info
> Scalar encoder -----
min-max/range : 0-1000/1000
buckets,width,n : 46,5,50
sc.build_cmap(start=0, end=1000, step=10)
sc> building classification map ...
sc.info
> Scalar Classifier ==============================
range,step: 0 - 1000 , 10
> Scalar encoder -----
min-max/range : 0-1000/1000
buckets,width,n : 46,5,50
> 2D Bitmap -----
Dims in bits : 100,50
Mem use : 0.00Mb
Fill %: 0.10%
Avg item % fill: 0.10%
Max item % fill: 0.10%
sc.best_match(55)
50
sc.best_match(45)
50
sc.best_match(40)
30
sc.best_match(35)
30
sc.best_match(30)
30
sc.best_match(25)
30
sc.best_match(20)
0

I also wrote small script to test different combination of params for ScalarClassifier (I intend to replace it with something more sophisticated, but for now SC was the easier way to start testing).

$ ./test_scalar_classifier.py -h 1

Example :
./test_scalar_classifier.py -b 100 -n 0 -x 1000 -s 10 -w 5

Help :
	-n minimum
	-x maximum
	-s step
	-b number of bits
	-w SE bit width

$ ./test_scalar_classifier.py -b 300 -n 0 -x 1000 -s 10 -w 6

> Scalar Classifier ==============================
range,step: 0 - 1000 , 10
> Scalar encoder -----
min-max/range : 0-1000/1000
buckets,width,n : 295,6,300
> 2D Bitmap -----
Dims in bits : 101,300
Mem use : 0.00Mb
Fill %: 0.02%
Avg item % fill: 0.02%
Max item % fill: 0.02%

MAPE: 0.0052012012012
MAE: 2.598
RMSE: 3.07896086367
NLL: 0.500050304595

Loading generated data

Once you .save() a test you can load any object or the result stats and work with them.

r = Results.load('../tmp/dt/hot-gym/272153-4390_results.res') r.test_list() dt.first.plot_data(ys=r.series['5x300']['ys'],yhat=r.series['5x300']['yhat'])

You need instantiated DataTest object, so you can call plot_data() OR you can do it manually you have the data at your disposal. The benefit of using the Results directly is that when you save the image as .png or .svg, the result is not that good and there is no good viewer to zoom&pan as the native matplotlib-viewer.

Or better yet, just use this viewer I created : bin/res_viewer.py ../tmp/dt/180057-15000_results.res

Running batch of tests

Primary use to run multiple tests from command line. Stores the results and objects in the /tmp directory. Can later be restored and reused.

Example :
	python run_test_suite.py -d ny -r 5,8 -c 100,200 -w 0.02 -s 0 -e 100
 Test suite params :
	-d : data set : ny, dollar, hot-gym, sine
	-r : number of rows - a list
	-c : number of cols - a list
	-w : winners percent, controls sparsity - a list
	-b : in data set what position to start
	-e : in data set what position to end

Runs all the combinations .. 5x100, 5x200, 8x100, 8x200 … -w too