182 Index
Blockchain(s) (cont.) private/restricted, 43 Blockchain data
addresses, 59
Bitcoin blockchain and Bitcoin Core, 49 blocks, 50, 51
Bitcoin network, 56
Bitcoin value distribution, 57, 58 block-height, 59
coinbase transaction, 59 merkle-root-hash, 58 nonce, 59
previous-block-header-hash, 57 target/n-bits, 59
time, 58 version, 57
clustering (seeClustering blockchain data) consensus-based development, 49 documentation, 49
flow of currency
change-making transactions, 52 DAG, 52, 53
P2PKHscheme, 53–54 transaction fees, 52–53 UTXOs, 52
mining, 51 models, 47–48 nodes, 60
operations model, 49 owners, 60
transactions
coinbase transaction, 51
four representative block groups, 55–57 graph, 52–54
identifier, 51 locktime feature, 52 outpoint, 52 pubkey script, 52, 55 secondary structure, 53, 54 sequence number, 52 signature script, 52, 55 value, 52
vertices labeling, 54–55
Block-incremental CP decomposition (BICP) evaluation
ALS process, 164 alternative strategies, 163 criteria, 163
data sets, 163 data updates, 163
execution times and decomposition accuracies, 164–165
hardware and software, 163
redundant refinement reduction, 161–162
update sensitive block maintenance, first phase, 160
update sensitive refinement, first phase, 161 Block reward, 51
Block subsidy, 51
C
CANDECOMP/PARAFAC (CP) decompositions, 148–149 Center displacement k-means method
(CDKM), 12
Centers reduction-based methods, 11–13 Closed-loop multistep deep clustering, 75, 85
CCNN, 85–86 DBC, 85 DEC, 84–85
Cluster assignment hardening loss, 78–79 Clustering blockchain data, 48
address merging bootstrap methods, 62
co-occurring transaction inputs, 62–63 peer host address, 63
temporal patterns, 63–64
transaction input–output patterns, 63 well-known services, 64
evaluation
distance-based criteria, 65–67 external, 64–65
human-assisted criteria, 68 internal, 65
purity and entropy, 64 sensitivity to cluster count, 67 tagged data, 68
feature extraction, 61–62 scalability, 64
Clustering CNN (CCNN), 85–86 Clustering loss, 78
Cluster separation, 66 CluStream, 119
CMU-CERT data sets, 130 compactSize, 55
Compute Unified Device Architecture (CUDA), 5–6
Continuous Outlier Detection (COD), 123 Cryptocurrencies, 43
D
Data assignment, 13 Data clustering, 13 Data mining, 25
Data reduction algorithm to cluster large-scale data (DRFCM), 11
Index 183
Data reduction-based methods, 10–11 Data sampling, 13
Data skeleton, 13 Data stream
behaviours, 116 data acquisition, 115 feature space, 117
malicious insider threat detection any-behaviour-all-threat, 116–117 clustering methods (seeData stream
clustering) in data sets, 116
stream mining problem, 115 threat hunting, 116
Data stream clustering cluster tracking, 119 outlier detection, 120
streaming anomaly detection, insider threat detection
data set, 130 deep learning, 120
distance-based outlier detection techniques, 123–125 DNN model, 120
E-RAIDS approach (seeEnsemble of random subspace anomaly detectors in data streams)
feature space, 121–122 ocSVM, 121
RNN, 120 XABA, 121 DBSCAN algorithm, 64
DBSCAN-based clustering models, 30 Deep clustering
closed-loop multistep deep clustering, 75, 85
CCNN, 85–86 DBC, 85 DEC, 84–85
deep representation models, 75, 76 joint deep clustering, 74–75, 82
DCN, 83, 84 FaceNet, 83 JNKM, 84 TAGnet, 82–83 loss functions, 75
autoencoder reconstruction loss, 77 cluster assignment hardening loss,
78–79 clustering loss, 78
joint deep clustering loss function, 78 types, 76–77
sequential multistep deep clustering, 74 deep SSC, 80
DSC, 81
fast spectral clustering, 79–80 NMF+k-means, 82 taxonomy, 74
Deep clustering network (DCN), 83, 84 Deep embedded clustering (DEC), 84–85 Deep learning (DL)
clustering approaches (seeDeep Clustering) predictive modeling tasks, 73
unsupervised pretraining, 73 Deep neural network (DNN), 76, 120 Deep representation (DR) models, 75, 76, 79 Deep subspace clustering (DSC), 81 Directed acyclic graph (DAG), 52, 53 Discriminatively boosted clustering (DBC),
85 Distance-based criteria
cluster quality criteria, 65–66 compactness and isolation, 65 Mahalanobis distance, 67
Distance-based outlier detection techniques, 123–125
DL,seeDeep learning
E
ECLUNalgorithm, 38 Element-wise distances, 66
Energy-efficient distributed in-sensor-network k-center clustering algorithm with outliers (EDISKCO) algorithm average energy consumption, 36, 37 clustering quality, 36, 37
on coordinator side, 36 memory and residual energy, 35 on node side, 36
SenClu, 36–37
Ensemble-based insider threat (EIT), 121 Ensemble of random subspace anomaly
detectors in data streams (E-RAIDS) advantage, 117
any-behaviour-all-threat, 125 AnyOut, 117
evaluation measures, 132–133 experimental results
MCODvs. AnyOut base learner, evaluation measures, 134–138 MCODvs. AnyOut, voting feature
subspaces, 138–139
more than one-behaviour-all-threat detection, 141
real-time anomaly detection, 139–141 experimental tuning, 131–132
feature subspaces, 117
184 Index
Ensemble of random subspace anomaly detectors in data streams (E-RAIDS) (cont.)
data repositories and survival factor, 127–128
definition, 126
ensemble of random feature subspaces voting, 129
local outlier detection, 125 framework, 125–126 MCOD, 117 RandSubOut, 118 survival factor, 117–118 vote factor, 118
F
Fast spectral clustering (FSC), 79–80 Feature space, 117
FPAlarm, 133, 134
Fuzzy c-means clustering using MapReduce framework (MRFCM), 7–8 Fuzzy c-means using MPI framework
(MPIFCM), 5
G
Genetic algorithm (GA), 91 Gibbs samples, 167–168
GPU,seeGraphics processing unit GPU-based k-means method (GPUKM), 6 GPU fuzzy c-means method (GPUFCM), 6 Graph-based anomaly detection (GBAD),
121
Graphics processing unit (GPU), 14 architecture, 6
CUDA, 5–6
disadvantage, memory limits, 7 GPUFCM, 6
GPUKM, 6 multiprocessors, 6 streaming processors, 6 video and image editing, 5
Grid-based probabilistic tensor decomposition (GPTD), 166–167
H
Hadoop distributed file system (HDFS), 7, 95
HASTREAM, 32–33 Hybrids methods, 13–14
I
Input/output complexity, 104–105 Insider threat detection, 115–116 Intermediary data blow-up problem, 150 Intra- and inter-cluster similarity, 66
J
Joint deep clustering, 74–75, 82 DCN, 83, 84
FaceNet, 83 JNKM, 84 loss function, 78 TAGnet, 82–83
Joint NMF and k-means (JNKM), 84
K kd-tree, 11
k-means-based clustering models, 30 k-means using kd-tree structure (KdtKM),
11
Knowledge Discovery in Databases (KDD) dataset, 17
evaluation and visualization, 38 KPPS algorithm, 14
Kullback–Leibler divergence, 79
L
Labeled data, 1 LiarTreealgorithm, 34 LSHTiMRKM method, 13
M
Mahalanobis distance, 67
MapReduce-based k-means method (MRKM), 7
MapReduce-based k-prototypes (MRKP), 8 MapReduce model, 16
data flow, 7, 8 disadvantage, 9 flowchart, 94
fuzzy c-means clustering, 96 HDFS, 95
iterative algorithms, 92 k-means method, 96 k-prototypes, 96 map and reduce phases, 7 MRFCM, 7–8
MRKM, 7
Index 185
MRKP, 8
principal components, 94 shuffle phase, 7 shuffling step, 94
MCOD,seeMicro-cluster-based continuous outlier detection
Merkle tree structure, 50
Message passing interface (MPI), 3, 5, 16 Micro-cluster-based continuous outlier
detection (MCOD), 117, 123 vs. AnyOut
in evaluation measures, 134–138 voting feature subspaces, 138–139 centres of, 124
experimental tuned parameters, 131–132 micro-clusters, definition, 123
MinBatch k-means method (MBKM), 10 Miners, 51
Monte Carlo-based Bayesian decomposition, 166
MPI-based k-means (MPIKM), 5 MR-CPSO
existing methods, 97, 98 modules, 96–97 shortcomings, 97–98 vs. S-PSO, 104–105 Multiprocessors (MPs), 6
N
Noise-profile adaptive decomposition (nTD) method
benefits, 166 evaluation
criteria, 170 data sets, 169
hardware and software, 170
leveraging noise profiles impact, 170, 171
noise, 169 GPTD, 166–167
Monte Carlo-based Bayesian decomposition, 166 noise-sensitive sample assignment
Gibbs samples, 167–168 naive option, 168
SIG-based sample assignment, 168–169 probabilistic two-phase decomposition
strategy, 165 tensor noise, 165 Nonce, 51, 59
Nonnegative matrix factorization (NMF), 82 nTD,seeNoise-profile adaptive decomposition
method
O
OMRKM, 13
One class SVM (ocSVM), 121
Overlapping k-means method using Spark framework (SOKM), 10
P
Particle swarm optimization (PSO) algorithm, 93
clustering method
MapReduce model, 92, 96 MR-CPSO (seeMR-CPSO) using Spark (seeSpark-based PSO
clustering method) in fitness computation, 96 hybrid method, 95–96 personal best position, 93
population-based optimization algorithm, 93
social behavior of birds, 92–93 swarm intelligence algorithms, 92 theoretical analysis
complexity analysis, 104–105 time-to-start variable analysis, 105 Partitional clustering methods
Big data analytics, 2 efficiency, 3 fuzzy c-means, 2
iterative relocation procedure, 2 k clusters, 2
k-means, 2, 91 k-modes, 2 k-prototypes, 2 for large-scale data
empirical results, 17–20 quality of k-means, 18 real datasets, 17 representative method, 16 running time of k-means, 17–18 simulated datasets, 17
SSE, 18 optimization, 1
scalable partitional clustering methods (seeBig data partitional clustering methods)
Pattern Assignment and Mean Update (PAMU), 11
Pattern Compression and Removal (PCR), 11 Pay to public key hash (P2PKH) scheme,
53–54
Personalized PageRank (PPR) scores, 159, 160
186 Index
Personalized tensor decomposition (PTD), 147 evaluation
criteria, 175 data set, 175
decomposition strategies, 175 hardware and software, 175 results, 175–177
foci of interest, 172
problem formulation, 172–173 rank assignment, 173–174 sub-tensor rank flexibility, 173 PreDeConStream, 31–32 PRKM method, 10, 11 Pseudoanonymity, 47
PSO,seeParticle swarm optimization PTD,seePersonalized tensor decomposition
R
Real-time anomaly detection system E-RAIDS (seeEnsemble of random
subspace anomaly detectors in data streams)
RADISH, 117
Real-time stream mining problem, 115 Receiver-operator characteristic (ROC), 67 Recurrent neural network (RNN), 120 Recursive partition k-means (RPKM), 10 Resilient distributed dataset (RDD), 9, 10, 95
S
Scalable partitional clustering methods,see Big data partitional clustering methods
Semi-supervised learning, 1
Sequential multistep deep clustering, 74 deep SSC, 80
DSC, 81
fast spectral clustering, 79–80 NMF+k-means, 82
SIGs,seeSub-tensor impact graphs Simulated annealing (SA), 91 Space complexity, 104
Spark-based k-prototypes (SKP) clustering method, 9
Spark-based methods, 9–10, 16
Spark-based PSO clustering method (S-PSO), 92
data assignment and fitness computation step, 98–100
environment and data sets description, 105–106
vs. existing methods, 108
k-means algorithm, 98 k-means iteration step, 102–103 methodology, 105
vs. MR-CPSO, 104–105 pbest and gbest update step, 101 performance measures, 107
position and velocity update step, 101–102 process flowchart, 98, 99
scalability analysis running time, 109, 110 scaleup results, 109, 111 sizeup results, 109, 112 speedup results, 109, 111
Time-To-Start variable impact, 108–109 Sparse subspace clustering (SSC), 80 S-PSO,seeSpark-based PSO clustering
method
Stochastic sub-gradient descent (SGD), 81 Stream clustering, Big data
advanced anytime stream clustering algorithms, 34, 35
anytime mining algorithms, 30 budget algorithms, 30
energy awareness and lightweight clustering, sensor data streams, 30 energy-efficient algorithms and clustering
sensor streaming data ECLUNalgorithm, 38 EDISKCO algorithm, 35–37 high-dimensional density-based stream
clustering algorithms curse of dimensionality, 30
DBSCAN-based clustering models, 30 HASTREAM, 32–33
k-means-based clustering models, 30 PreDeConStream, 31–32
self-adjustment, 30 subspace clustering, 30 properties, 39, 40
storage awareness and high clustering quality, 29
stream changes and outlier awareness, 29 subspace stream clustering, 38–39 Streaming data
eye-tracking system, 28–29
mining body-generated streaming data, 28 multiple data collection sensors, 27, 28 social data, 26
static mining, 26
streaming tweets with tags and time, 27 wired streaming data, 27
wireless sensor network deployment, 27–28
Streaming processors (SPs), 6