An Assistive Tool for Authoring Visualization Thumbnails

In recent times, storytelling in data-driven articles that fall under the category of data journalism has been significantly adapted by news organizations. Before continuing with the thesis, the term (data) visualization in this study is intended to describe all forms of data-driven visualization and the use of visualization especially in news organizations, which also includes, for example, complex infographics.

Motivation and purpose of research

Powerbi and designer with Adobe Photoshop or Adobe illustrator skills were required in addition to the journalists for creating the data-driven thumbnail. It is a set of two components: first, a Python-based Natural Language Processing (NLP) model for data extraction from the given URL of the data-driven document.

Figure 1: An authoring tool to create informative visuals from unstructured document.

Thesis layout

Then, we also check the VTComp's compatibility by processing several data-driven articles from various top news organizations such as The Economist, New York Times, Pew Research Center, etc. In this section, I review previous research on interconnected domains. including data-driven journalism, tools for creating visualizations, tools for generating data-driven images from unstructured text data, and the importance of different visualization components in designing the thumbnails.

Data-driven Journalism

Visual thumbnails give the first impression of the article, which helps users to continue reading the article or move to another, perhaps useful to remember. Visualizing with formatted data is easy and very useful to use, but in most cases preprocessing of the data is required before creating the visualization.

Figure 2: worldwide population in 2018 [1].

Visualization from unstructured data

Previous study [40] also stated that Data Exploration is difficult and error prone without using data mining tools. Then, extracting the target data from human-sourced unstructured textual datasets can be error-prone, and visualization with this type of data can lead to inaccurate information being presented.

Figure 4: Visualization assist to present complex and large volume of data. Left side table contains the countries and their GDP

Data-Driven Storytelling

Next, design tools help the author to create designs in a very short time, but sometimes they need more training to get used to the tools. So, in the given paragraph we will discuss the importance of the World Wide Web as a visualization platform and why it is becoming an important platform. The first chart from the top includes an image of a monster, representing a statistic in which House and Senate campaign spending is shown in the shape of monster teeth.

It has become a very interesting solution for providing quick information from any part of the world to another. For accuracy, we created the custom dataset using the VTComp user interface where participants design their own shapes using a pencil and iPad, for design classification. In this section, we present the overview of the tool, which is used to author visualization thumbnails, called VTCompinion (VTComp), as shown in Figure 8.

There are two main categories of the VTComp tool: the first is model, which consists of subcategories such as a preprocessing unit that transforms data insights into visualization and formatted text, and second: a graphical user interface as shown in Figure 11.

Figure 6: Usage of visual embellishments in visualization elements for the comprehension and memorability of charts [4]

Pre-processing Component

The preprocessing component analyzes the document and prepares the necessary information that is ready in the authoring component. On the other hand, when data is already in the structured form, data reduction may involve some editing, scaling, encoding, sorting, collection and product in summary in the format of tables. Additionally, in VTComp, the data-driven visualization thumbnail creation process begins with parsing the target document's URL, where the author searches for graphical and textual insights from the source as depicted in Figure 8.

VTComp enables the extraction and instinctive formation of entity relations and keeps the record. Keywords are the smallest unit of information that can provide an overview of the entire document, and for extracting important keywords we used the TF-IDF (Term Frequency-Inverse Document Frequency) approach [85]. Our model first detects temporal information from unstructured text and also identifies fields containing dates within document tables, for example, "August 8, 1994" or a date of a format such as

Also, for dates like Today we consider the publication date of the article, as in a decade will be from the publication date either in the future or in the past.

Figure 9: Several operations performed by pre-processing component to generate the target data-set.

Visual Interface

In addition to that, VTComp enables the author to read the article by continuing the reading article which is just next to the process button option. In addition, to enable the author to expand the design surface, VTComp provides options such as collapsing and expanding the specific view, which is useful during the design of the data-driven visuals. Inserting custom text as textual notes: This feature allows the author to add additional information to his/her design.

VTComp offers the option to add textual annotation as an auxiliary element that helps the author to visually express data insights. VTComp allows the author to add multiple HROs with full flexibility such as resizing, rotating, moving, etc. To enable the author to use these shapes, VTComp provides basic geometric shapes such as circle, rectangle, line, etc.

VTComp allows the author to use this tool, and in addition to using it, the author can adjust the opacity and width of the stroke, which is helpful in adjusting the width of the stroke, as shown in Figure 14.

Pilot Study

In this section, I outline the experiment in which participants created the interactive visual thumbnails for data-driven journalism using VTComp (see section 3). Before conducting the final experiment, we conducted a pilot study several times in the preliminary phase with lab members with design and data journalism backgrounds to discover bugs and flaws in the tool. The motive behind conducting the experiment is to evaluate the performance, effectiveness and usability of the system.

Final Study

In the following, in order to observe the user experience, we divided the participants into two groups, the VTComp group and the Sketch group. The experiment to create informative visuals took approximately 30 minutes for each individual interview session and the excluded user's introductory session. Finally, we conducted a semi-structured interview session where valuable feedback was obtained (discussed in the results section).

Participants gave us their answers about the strengths and weaknesses of the visual creation tools currently available. While introducing VTComp to these participants and asking them to use it, the feedback we received was quite convincing, further, they noted that VTComp was an easy-to-use tool that could be useful for more few technical and beginners in domain design. The semi-structured interview was also recorded in written form to assess the strengths and weaknesses of the systems.

In this figure, I used five different data-driven news articles from different news agencies such as the New York Times, Pew Research Center, and fiviThirtyEight.

Figure 16: Experiment process for comparing performance between VTComp group and Sketch group.

Extraction Error Evaluation

NLP removes some entities because it recognizes the important meaning and incorrectly associates those entities with the document date. To facilitate the author, if any entity is missing or incorrect author can add or update very easily from the VTComp feature, which allows the author to see the data behind the created visual charts.

Use Scenarios

To facilitate the author, if an entity is missing or incorrectly authored, it can be added or updated very easily through VTComp's feature, which allows the author to see the data behind the visual diagrams created. a) VTComp Initial view, annotation represents model response sections. Figure 19 shows several examples of VTComp visual response that can help the author quickly decide that an article with a particular URL is not a suitable source for data-driven visualization. However, if the author wants to continue with a particular article, he/she can create custom visuals with the help of VTComp's options, such as adding HROs and some basic annotation components. The decision was made in less than 20 seconds in all cases, with most of that time spent copying, pasting, and waiting for extraction; Once processing of the item is complete, the decision is essentially instant.

All three do not contain sufficient temporal information and can therefore quickly be ruled out as a suitable basis for expressive data-driven visualization.

Figure 18: An overall VTComp visualization creation process. We demonstrate this process using data-driven news article [6] which contains unstructured data

Experiment Results

P16 and P7 used the text information view to get an overview of the article as shown in Figure 21. We identified that the participants want some of the items to automatically open as guidelines when they select. However, we also observed that some of the participants wanted to add HROs, which are not available in either tool-extracted data or in the iPad.

In order to observe the participants' behavior in more detail, we analyzed the interview results and their think-aloud diaries. Thus, during the task, we also notice some suggestions from the participants, in which they said that the inclusion of the export of the extracted data in CSV format or excel would be a nice work, which could continue to be used outside the VTComp environment. In addition to extracting entities using the relationship between them, there are other tasks, such as designing the generated visual graphics on top of the model.

Mayer, “Off the beaten tracks: exploring three aspects of web navigation”, in Proceedings of the 15th international conference on World Wide Web, WWW 2006, Edinburgh, Scotland, UK, May L. Hollan, “Exploration and explanation in computational notebooks, ” inProceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April R. Myers, “The story in the notebook: Exploratory data science using a literate programming tool,” inProceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April R.