• Tidak ada hasil yang ditemukan

How to Read a Dendrogram

N/A
N/A
Reksa Gumilar

Academic year: 2024

Membagikan "How to Read a Dendrogram"

Copied!
8
0
0

Teks penuh

(1)

How to Read a Dendrogram

(Slide 1)

Reading Dendrograms

lexomics.wheatoncollege.edu

A dendrogram is a branching diagram that represents the relationships of similarity among a group of entities.

(Slide 2)

Dendrogram of Text A (cut into 1000 word chunks)

1 2 4 5 3

lexomics.wheatoncollege.edu

Here we have a basic dendrogram.

(2)

(Slide 3)

Dendrogram of Text A (cut into 1000 word chunks)

1 2 4 5 3

lexomics.wheatoncollege.edu

Clades Clade

Each branch is called a clade.

(Slide 4)

Dendrogram of Text A (cut into 1000 word chunks)

1 2 4 5 3

lexomics.wheatoncollege.edu

Clades Clade

Leaves Leaf

The terminal end of each clade is called a leaf.

(3)

(Slide 5)

1 2 4 5 3

lexomics.wheatoncollege.edu

Outliers

“simplicifolious”

Clades can have just one leaf (these are called simplicifolious, a term from botany that means “single- leafed”) or they can have more than one. Two-leaved clades are bifolious, three-leaved are trifolious, and so on. There is no limit to the number of leaves in a clade.

(Slide 6)

Dendrogram of Text A (cut into 1000 word chunks)

1 2 4 5 3

lexomics.wheatoncollege.edu

The arrangement of the clades tells us which leaves are most similar to each other. The height of the branch points indicates how similar or different they are from each other: the greater the height, the greater the difference.

We can use a dendrogram to represent the relationships between any kinds of entities as long as we can measure their similarity to each other. In Lexomic analysis, we compare the distribution of different words among whole texts or segments of texts.

In this dendrogram, we have cut a text into 5 segments—also called chunks—that are each 1000 words long.

(4)

There are two ways to interpret a dendrogram: in terms of large-scale groups or in terms of similarities among individual chunks.

To identify large-scale groups, we start reading from the top down, finding the branch points that are at high levels in the structure.

(Slide 7)

Dendrogram of Text A (cut into 1000 word chunks)

1 2 4 5 3

lexomics.wheatoncollege.edu

In this particular dendrogram, we see that chunk three is completely separate from all the others. Chunk three is therefore simplicifolious. We interpret its placement as indicating that the distribution of words in that chunk is substantially different from the distribution in the remaining chunks.

These other chunks can be clustered into groups.

(Slide 8)

Cluster from Top Down

1 2 4 5 3

lexomics.wheatoncollege.edu

(5)

(Slide 9)

Cluster from Top Down

1 2 4 5 3

lexomics.wheatoncollege.edu

In this dendrogram chunks one and two are more similar to each other than they are to four or five. The branch containing chunks one and two is a clade. We usually label clades—at any level of the diagram—

with Greek letters, moving from left to right and top to bottom.

If we are trying to identify which individual segments are most similar to each other, we read the

dendrogram from the bottom up, identifying the first clades to join together as we move from bottom to top.

(Slide 10)

1 2 4 5 3

lexomics.wheatoncollege.edu

Cluster from Bottom Up

The connection between chunks one and two is the closest link to the bottom of the diagram. Therefore chunks one and two are most similar and join together first in the branching diagram.

(6)

(Slide 11)

1 2 4 5 3

lexomics.wheatoncollege.edu

Cluster from Bottom Up

Chunks four and five are similarly clustered, indicating that chunks four and five are more similar to each other than they are to any other chunks.

(Slide 12)

1 2 4 5 3

lexomics.wheatoncollege.edu

Cluster from Bottom Up

Moving up, we see that the next joining connects the clade with chunks one and two and the clade with chunks four and five. This geometry indicates that every chunk within that cluster is more similar to each other than to any chunks that join at a higher level (in this case the only other chunk is three).

(7)

(Slide 13) lexomics.wheatoncollege.edu

Verticals Show Degree of Difference

1 2 4 5 3

The height of the vertical lines, highlighted here in red, indicates the degree of difference between branches.

The longer the line, the greater the difference.

(Slide 14)

3 1 2 4 5 3

Horizontal Reading

lexomics.wheatoncollege.edu

The horizontal orientation of dendrograms is irrelevant. Imagine the dendrogram as a mobile, in which the arms can shift position, but the vertical height and subgroup organization remain constant. For example, it makes no difference whether segment three lies on the left or the right of the other clusters.

By themselves, dendrograms only tell us a bit about the similarities of word patterns in texts. But when we link them to other types of information—like ribbon diagrams or traditional kinds of textual analysis—we can often draw significant conclusions. A dendrogram can tell you where to look more closely, and it can provide independent support or contradiction for various hypotheses about similarity and difference.

By: Michael Drout and Leah Smith

(8)

lexomics.wheatoncollege.edu

"Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.”

Referensi

Dokumen terkait

Different voice conversion system can use different methods, but at least most of the system consists of several components such as methods to represent the specific

Nineteen swamp buffaloes were measured the conformation by linear measurement compared to 3D body scanner at different points : body height (A), heart girth (B), shoulder width

The proposed method utilizes three windows with different sizes in small, average and large to cover the entire LiDAR point clouds, then with a height difference threshold,

Genetic difference between two phenotypically similar members f Asteraceae by the use of intergenic spacer atpB – rbcL.. Molecular characteristics of two phenotypically identical

Animal performance Over the whole survey high N use was associated with significantly greater stocking rate and lambing percentage but similar wool production and slightly higher

Ingrowth roots have a signal much more similar to the expected above- ground value, but still 4.8%o greater, a difference that is attributed to the remobilization of unlabeled C from

It indicates that the use of process genre approach cannot improve the writing ability of the eleventh grade students of SMAN 4 Parepare.. b If t-test greater than t-table, H1 was

As the F > F critical and P≤ 0.05 therefore, the result indicates that there is significant difference in number of yield under different coloured shading nets compared to control