iSDP : indexed Semantic Distributed Pointers aka Symbols

Before we start, let me mention that the isdp.py module has two interfaces : low level functions in isdp and Object oriented in iSDP().

iSDP class uses isdp functions for the basic functionality. As I already mentioned the whole project is based on indexed SDP:SDR, not on binary SDP:SDR. An iSDP is a numpy array which holds the indexes of the bits of a binary that have a value of ONE. (It is literally extending the numpy array class.)

In addition it also has a attribute called “vsize” which sets the virtual size of the binary i.e. what is the maximum possible value of a index.

spa and spa_nbits reflect the sparsity.

import sys
sys.path.extend(['../../ilib'])
from isdp import *

We can create iSDP vector by specifying the virtual-size (vsize), and as a second parameter vsn pass a value (which can be a list of numbers, numpy array or another iSDP), sparsity % (as a float number) OR number of bits which are 1s (as an integer).

val = iSDP(vsize=1000, vsn=0.02)
print val.vsize
print val.spa
print val.spa_nbits
print val
1000
0.02
20
15:124:159:190:227:230:279:293:364:431:449:497:572:700:743:751:771:775:847:947

Shows us which bits are ON and the maximum is 1000.

So we can create symbols and apply operations and transformations on them.

square = iSDP(vsize=1000, vsn=0.02)
circle = iSDP(vsize=1000, vsn=0.02)
rectangle = iSDP(vsize=1000, vsn=0.02)
dot = iSDP(vsize=1000, vsn=0.02)

Overlap vs similarity

There are two ways of figuring how similar two symbols are !

The first one is OVERLAP which is the count of 1-bits that are common between the symbols.

Definitionaly it is count(A AND B)

square / circle
1

… the overlap between the newly created symbols is ZERO or very small, which is what we may expect. We can generate new symbols all day long but the overlap between any two of them is guaranteed to be small as long as the SDP is big enough and sparse enough.

Said it another way the symbols are ORTOGONAL

The second way to check similarity is to divide the OVERLAP to the count of the bits of one of the operands (left one in our case)

  sim = olap(a,b) / count_ones(a)

this type of similarity is percentage rather than a number. Also it is asymetric if the operands have different sparsity.

More on this in a moment…

square // circle
0.050

BASIC Operations

There are three basic operations we can apply on SDP:SDR’s : union, concatenation and thinning

Here are what distinguish them :

  1. UNION : as the name implies this op combines all the 1-bits. This is logical OR operation. What is important to remember about it is that vsize is preserved but the sparsity decreases (i.e. the SDP becomes more dense, more 1-bits)

  2. CONCATENATION : joins the SDP’s, but also shifts the bits positions, so that every operand “moves” to the right in a sense in its own segment. Result, vsize increases and the sparsity stays the same.

  3. THINNING : picks randomly bits from the operands with equal probability. vsize and sparsity stay the same. The other option is to use Context Dependent Thinning (CDT) isdp.cdt().

Lets look in more detail :

UNION

figure = square | circle
print figure
print figure.vsize, figure.spa, figure.spa_nbits
print circle / figure #overlap
print circle // figure  #similarity
print figure / circle
print figure // circle
7:11:80:81:100:128:160:219:248:253:255:260:264:345:355:463:488:494:607:612:630:632:634:639:641:652:655:670:680:693:777:821:874:885:906:907:908:952:979
1000 0.039 39
20
1.0
20
0.512820512821

As you can see the density went up from 0.02 to 0.04 (number of 1-bits from 20 to 40) i.e. sparsity went down.

Then we check how similar one of the included operand is to the union :

  • case 1 : ‘circle’ is included fully in the ‘figure’ SDP, so when it is on the left side the similarity is 100% with olap 20 bits

  • case 2 : the other way around how similar is ‘figure’ to a ‘circle’, just 50%, the olap is still 20 bits

this is what I meant when I said similarity is asymmetric i.e. how much a ‘circle’ is a ‘figure’ in comparison to how much a ‘figure’ is a ‘circle’.

It totally make sense because human Concepts are the same way. Euclidean and cosine distance used in NN are not natural they are symmetric.

Concatenation

figure = square + circle
print "square: ", square
print "circle: ", circle
print "figure: ", figure
print figure.vsize, figure.spa, figure.spa_nbits
square:  7:11:81:100:219:248:264:345:488:607:630:632:634:639:641:655:670:777:821:906
circle:  80:128:160:219:253:255:260:355:463:494:612:652:680:693:874:885:907:908:952:979
figure:  7:11:81:100:219:248:264:345:488:607:630:632:634:639:641:655:670:777:821:906:1080:1128:1160:1219:1253:1255:1260:1355:1463:1494:1612:1652:1680:1693:1874:1885:1907
1908:1952:1979

2000 0.02 40

So as I mentioned the vsize increased, the sparsity stays the same … but the circle part of the concatenation was shifted by 1000 bits.

One drawback of this operation is that you can not compare the result with the constituents SDP’s because now they have different sizes ;(

Context Dependent Thining (CDT)

figure = square * circle
print figure
print figure.vsize, figure.spa, figure.spa_nbits
print circle / figure
print circle // figure
print figure // circle
81:100:128:160:219:248:253:255:264:345:355:494:634:639:670:693:777:821:885:979
1000 0.02 20
10
0.5
0.5

Hmmm … CDT what the heck is this ?

(read any paper by Dmitri A. Rachkovskij to understand more)

The idea is to preserve the sparsity and the vsize. The consequence is that we can only pick part of the bits of the operands, in our case we are picking bits randomly (isdp.thin()).

Functionally you can think of it as a sort of compression. The number of bits that ends up in the result is proportional to the number of the items to be thinned i.e. 1/n.

The SDP:SDR does not have bind operation as purported by the VSA ‘specs’, thinning in some ways is better and in some ways worse.

Binding ‘mutates’ the operands so that you have to use a ‘reverse-bind’ operation to check if an item is part of the binding.

That is good when you need to build structures with the same components or Key:Value pairs, but is opaque and requires unwinding.

Thinning is more ‘holistic’ i.e. you can see items directly w/o additional operation, but if you want to ‘mutate’ items like in binding you have to permute the vectors before ‘thinning’. You can not directly build Key:Value pairs.

Abstraction: Binding allows easy Dicts, Thinning allows easy Sets.

So pick your poison :)

Allow me to make one additional remark, HTM theory sort of rebukes the need of binding. In fact the idea of symbol BINDING has puzzled alot of people, because it doesn’t make sense when you think of how the way the brain work. Possible solution is POOLING, SPATIAL and TEMPORAL.

Let try another way of doing thinning :

union = square | circle | rectangle
figure = union << 0.02 #thinning shrink the spa_nbits
# or : figure = union << 20
print figure
print 'union : ', union.vsize, union.spa, union.spa_nbits
print 'figure: ', figure.vsize, figure.spa, figure.spa_nbits
print circle / figure
print figure / circle
print circle // figure
print figure // circle
80:81:128:145:160:253:255:345:425:494:607:612:630:692:777:806:821:885:906:908
union :  1000 0.058 58
figure:  1000 0.02 20
9
9
0.45
0.45

The left-shift << operator can be used to thin a symbol to specific sparsity (instead of using isdp.thin())

And we also see that the part of the items are proportional i.e. 1/3 =~ 0.33% similarity

Those were the 3 basics ops, lets see what else we can do.

More operations

Permutation/Shift

print square
print square >> 10
print square >> -10
7:11:81:100:219:248:264:345:488:607:630:632:634:639:641:655:670:777:821:906
17:21:91:110:229:258:274:355:498:617:640:642:644:649:651:665:680:787:831:916
997:1:71:90:209:238:254:335:478:597:620:622:624:629:631:645:660:767:811:896

Currently I implemented only shift which is a kind of permutation.

Permutation is used when we want to obfuscate a symbol.

F.e. let say we want to build key:value pairs :

slot_father = 2
slot_son = 4

father = iSDP(vsize=1000, vsn=0.02)
son = iSDP(vsize=1000, vsn=0.02)

vader = iSDP(vsize=1000, vsn=0.02)
luke = iSDP(vsize=1000, vsn=0.02)

#key:value
parent = vader >> slot_father
child = luke >> slot_son

print "vader is parent ?" 
print vader in parent
print (vader >> slot_father) in parent

print "family = ((father,vader), (son,luke))" 
family1 = parent | child
print vader in family1
print (vader >> slot_father) in family1
print (vader >> slot_father) // family1
print family1.spa

print "family = (<father, vader>, <son,luke>)"
p1 = father * vader
p2 = son * luke
family2 = p1 | p2

print vader in family2
print vader // family2
print family1.spa
vader is parent ?
False
True
family = ((father,vader), (son,luke))
False
True
1.0
0.04
family = (<father, vader>, <son,luke>)
True
0.55
0.04

OR a list with repeated elements …

Sequence

ary = (circle >> 1) | (square >> 2) | (rectangle >> 3) | (square >> 4) | square

print square  in ary
print (square >> 1) in ary
print (square >> 2) in ary
print (square >> 4) in ary
True
False
True
True

I’ve build a Sequence class for which you can learn more in the Lexicon part of the documentation