TSTree.st
author Claus Gittinger <cg@exept.de>
Mon, 16 May 2016 08:37:03 +0200
changeset 3842 11e10753a2a9
parent 3812 2b4268814831
child 4121 cf558f5c22f9
permissions -rw-r--r--
#DOCUMENTATION by cg class: TSTree comment/format in: #matchesForString:distance:limitNodes:do: changed: #at:ifAbsent: #removeKey:ifAbsent:
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     1
"{ Package: 'stx:libbasic2' }"
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     2
3668
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
     3
"{ NameSpace: Smalltalk }"
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
     4
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     5
Collection subclass:#TSTree
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     6
	instanceVariableNames:'root'
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     7
	classVariableNames:''
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
     8
	poolDictionaries:''
2762
0532634862eb category changes
Claus Gittinger <cg@exept.de>
parents: 2468
diff changeset
     9
	category:'Collections-Ordered-Trees'
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    10
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    11
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    12
!TSTree class methodsFor:'documentation'!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    13
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    14
documentation
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    15
"
3668
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    16
    A bunch of collection classes that are useful for building large indices of things. 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    17
    It's especially geared towards people using OODBs like GOODS, but can be used it in the image too: 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    18
    the BTree class is great for when you need to select numeric keys by range, 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    19
    and TSTree makes a solid basis for full-text search. 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    20
    TreeSet has an interesting optimized #intersection: that lets you compare two collections without 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    21
    looking at every item of either. 
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
    22
3253
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    23
    TSTree has similar properties as BTree for String  keys. 
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    24
    So as well as keeping them sorted, you can do efficient lookups of all 
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    25
    the keys with a given prefix.  One other neat trick TSTree can do is a certain 
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    26
    amount of fuzzy matching (eg find all keys with an edit distance of 3 from 
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    27
    'foo') which makes it especially useful for spell checking and similar 
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    28
    applications.
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    29
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    30
    [author:]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    31
        Avi Bryant
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    32
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    33
    [license:]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    34
        Dual licensed under both SqueakL and MIT. 
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    35
        This enables both base Squeak inclusion and 100% reuse.
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    36
"
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    37
! !
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    38
3812
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    39
!TSTree class methodsFor:'queries'!
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    40
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    41
treeNodeClass
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    42
    ^ TSTreeNode
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    43
! !
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    44
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    45
!TSTree methodsFor:'accessing'!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    46
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    47
at: aString
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    48
	^ self at: aString ifAbsent: [self error: aString printString, ' not found']
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    49
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    50
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    51
at: aString ifAbsent: exceptionBlock
3842
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    52
        |ret|
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    53
        
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    54
        aString isEmpty ifTrue: [^ exceptionBlock value].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    55
        root isNil ifTrue: [^ exceptionBlock value].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    56
        
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    57
        ret := root lookupString: aString startingAt: 1.
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    58
        ret notNil ifTrue:[^ ret].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    59
        ^ exceptionBlock value
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    60
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    61
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    62
at: aString ifAbsentPut: exceptionBlock
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    63
	^ self at: aString ifAbsent: [self at: aString put: exceptionBlock value]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    64
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    65
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    66
at: aString put: anObject
3253
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    67
        aString isEmpty ifTrue: [self error: 'Keys cannot be empty'].
3812
2b4268814831 #REFACTORING by stefan
Stefan Vogel <sv@exept.de>
parents: 3668
diff changeset
    68
        root isNil ifTrue: [root := self class treeNodeClass key: aString first].
3253
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    69
        root lookupString: aString startingAt: 1 insert: anObject.
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    70
        ^ anObject
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    71
5c005be2975e Added TSMultiTree, a variant of TSTree that allows for multiple values to be stored
Jan Vrany <jan.vrany@fit.cvut.cz>
parents: 3042
diff changeset
    72
    "Modified: / 26-04-2014 / 11:23:02 / Jan Vrany <jan.vrany@fit.cvut.cz>"
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    73
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    74
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    75
removeKey: aString
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    76
	^ self removeKey: aString ifAbsent: [self error: 'Could not find key ', aString printString]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    77
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    78
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    79
removeKey: aString ifAbsent: errorBlock
3842
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    80
        | val |
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    81
        val := root removeString:aString startingAt:1.
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    82
        root canBeCulled ifTrue: [root := nil].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    83
        val notNil ifTrue:[^ val].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
    84
        ^ errorBlock value
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    85
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    86
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    87
values
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    88
	^ Array streamContents: [:s | self do: [:ea | s nextPut: ea]]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    89
! !
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
    90
3012
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    91
!TSTree methodsFor:'enumerating'!
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    92
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    93
do: aBlock
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    94
        root notNil ifTrue: [root do: aBlock]
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    95
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    96
    "Modified: / 08-08-2010 / 15:16:53 / cg"
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    97
! !
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    98
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
    99
!TSTree methodsFor:'searching'!
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   100
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   101
matchesForPrefix: aString
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   102
	^ Array streamContents:
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   103
		[:s |
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   104
		self matchesForPrefix: aString do:
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   105
			[:match |
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   106
			s nextPut: match]]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   107
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   108
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   109
matchesForPrefix: aString do: aBlock
2468
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   110
        aString isEmpty
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   111
                ifTrue: [self do: aBlock]
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   112
                ifFalse: [root notNil ifTrue: [root matchesForPrefix: aString startingAt: 1 do: aBlock]]
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   113
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   114
    "Modified: / 08-08-2010 / 15:17:01 / cg"
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   115
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   116
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   117
matchesForString: aString distance: aNumber
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   118
	^ Array streamContents:
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   119
		[:s |
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   120
		self matchesForString: aString distance: aNumber do:
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   121
			[:match |
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   122
			s nextPut: match]]
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   123
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   124
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   125
matchesForString: aString distance: aNumber do: aBlock
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   126
	self matchesForString: aString distance: aNumber limitNodes: nil do: aBlock
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   127
!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   128
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   129
matchesForString: aString distance: aNumber limitNodes: maxNodes do: aBlock
3842
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   130
        | nodeCount |
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   131
        nodeCount := 0.
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   132
        aString isEmpty ifTrue: [^ self].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   133
        root isNil ifTrue: [^ self].
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   134
        root matchesForString: aString startingAt: 1 distance: aNumber do: aBlock nodesDo:
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   135
                [:ea |
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   136
                nodeCount := nodeCount + 1.
11e10753a2a9 #DOCUMENTATION by cg
Claus Gittinger <cg@exept.de>
parents: 3812
diff changeset
   137
                nodeCount = maxNodes ifTrue: [^ self]]
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   138
! !
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   139
3012
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
   140
!TSTree methodsFor:'testing'!
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   141
3012
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
   142
isFixedSize
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
   143
    "return true if the receiver cannot grow"
2468
36c571b96063 changed:
Claus Gittinger <cg@exept.de>
parents: 2282
diff changeset
   144
3012
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
   145
    ^ false
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   146
! !
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   147
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   148
!TSTree class methodsFor:'documentation'!
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   149
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   150
version_CVS
3668
cd42dff33054 #DOCUMENTATION
Claus Gittinger <cg@exept.de>
parents: 3253
diff changeset
   151
    ^ '$Header$'
2282
a723f74eaae4 initial checkin
Claus Gittinger <cg@exept.de>
parents:
diff changeset
   152
! !
3012
bc770831871d added isFixedSize query
Claus Gittinger <cg@exept.de>
parents: 2762
diff changeset
   153