Participatory Design for Big Data - Big Data Analytics and Cloud Computing

Big Data analysis resembles much that we already know from AI since decades, in particular from DM and IR. Correlations in data may stand for relationships between facts pointing to a phenomenon. If I’m using keywords related to a certain

disease, I’m suffering from this disease and require appropriate treatment. We see immediately that the observed phenomenon is first of all an assumption about both the person and the illness. And these assumptions can be wrong. I may look up this information for somebody else, for instance. The combination of keywords may also stand for a completely different phenomenon.

The foundation of these assumptions is the analysis models. However, the verification of these models is barely part of model design. Usually, these models emerge from statistical methods, from factor or time series analysis, or are based on hidden Markov models, to name the most prominent areas. As long as they detect a critical mass of correlations, the problem of false positives does not gain sufficient momentum.

“We need to avoid the temptation of following a data-driven approach instead of a problem-driven one” [5]. Information consumers can usually sketch their information demand that summarizes the data they need to solve their information problem. They have a deep understanding of the foundations of their domain.

Thus, we need a stronger involvement of humans in the value chain of Big Data analysis. Sustainable success in Big Data, however, requires more than just controlling the results produced by Big Data analytics. The integration of more competence, in particular domain competence, means a more active role of the human actor in all stages of the value chain. This kind of user engagement goes beyond user requirement analysis, participatory design, and acceptance testing during the development of Big Data analytic systems. It means a more active role of the information consumers enabled by self-service features. This kind of self- service IT may point to user-friendly versions of analytic tools, enabling information consumers to conduct their own analytics. This smooth integration of domain and tool knowledge completes the picture of self-service discovery that meanwhile is also demanded by the industry [12]. There is no doubt that without proper technological and methodological support, the benefits of Big Data are out of reach.

Design for Big Data, however, requires an engagement of information consumers.

When their needs drive design, Big Data will provide the insights they require.

It is too often the case that technology as required and described by the users is not quite well understood by designers. There are several ways of human-centered design to overcome this lack of mutual understanding. There is user-centered design where users are employed to test and verify the usability of the system.Participatory design[11] makes available this form of user engagement; it understands users as part of the design team.

An organization’s information ecosystem usually hosts databases, content management systems, analytic tools, master data management systems, and the like that produce the flavor of information specific to this organization. Furthermore, there are communication tools, CRM and BPM systems, that feed the ecosystem with data, too. When we talk about information systems, we mainly mean systems and the data they manage. The users see these systems as different data channels that provide different blends of information. With the help of IT, they can retrieve their information through these channels, or they request reports containing the required information. For the users, these channels have precise qualities in terms

tive, it is probably a good idea to take these semantics as building blocks for an information governance framework.

Likewise, by assigning a set of keywords to every data channel, we can assign similar semantic badges to any concept reflected in the data of this channel. If we link all semantic badges over all data channels, we get a complete semantic representation of the information ecosystem the information consumers are dealing with. The network of semantic badges communicates what information the users can expect.

The management of such a semantic representation is quite a human endeavor.

They author suitable keywords representing the concept that constitutes particular information. The set of keywords thus acts as a blueprint for particular information.

At first, the badge is an abstract representation of information, written in natural language. They are not machine processable at this stage. A contract may have a badge labeled with the concepts “vendor,” “buyer,” “product,” “date,” and “price.”

The badge assigned to a diagnosis of a patient may list the key concepts “name,”

“age,” “observations,” and “date.” Each of these concept representations can be composed of further concepts. The concept “product” may stand for a “property”

described by “location,” “size,” and further attributes like the building erected on it.

The observations stated in the medical report can be further detailed along the organs they address and typical phenomena, like “hemorrhage,” “lump,” etc. The concept

“organ” again can be further detailed into its constituent parts and so on. In the end, we get a semantic representation of concepts that resembles a thesaurus. However, it is not a general thesaurus, rather an individual one, adapted to the specifics of the respective information ecosystem. Furthermore, it isn’t either a strict thesaurus where all its concepts are tightly integrated. It’s rather a collection of more or less loosely coupled fractions of a thesaurus, with its fractions dynamically changing both, in their compositions and relationships among each other. It is thus more suitable to consider semantic badges as ingredients of a common vocabulary. This vocabulary, in turn, is the asset of the information consumers. They manage it in cooperative authorship.

People working with information have a data-driven mindset per se [10, 15], that is, they resort to mental models [9] that abstractly reflect the facts they expect to encounter in their information environment [1]. This mindset enables them to sketch blueprints of the things they are looking for. Information consumers can express these blueprints in a way that later on can be processed by machines.

Fig. 2.1 Design cycle for the cooperative development of blueprints that govern the discovery process

These expressions are far from being programming instructions but reflect the users’

“natural” engineering knowledge [16]. The machine then takes the blueprints and identifies these facts in data, even though the blueprint abstracts away many details of the facts.

Experimenting with data is an essential attitude that stimulatesdata discovery experiences. People initiate and control discovery by a certain belief – predisposition or bias reinforced by years of expertise – and gradually refine this belief. They gather data, try their hypotheses in a sandbox first, and check the results against their blueprints, and then, after sufficient iterations, they operationalize their findings in their individual world and then discuss them with their colleagues. After having thoroughly tested their hypotheses, information consumers institutionalize them to their corporate world, that is, cultivate them in their information ecosystem. After reflecting the corporate blueprint in their individual world, they may get a further idea for further discoveries, and the participatory cycle starts anew (Fig.2.1).

The language knowledge and their mental models constitute shallow knowledge necessary and sufficient to engineer statements that are processable by the discovery services [13]. The blueprints thus serve two purposes: they reflect semantic qualities of the facts that need to be discovered, and simultaneously, they are the building blocks of the metalanguage that, when correctly syndicated, support data integration and sharing. While syndicating metadata along their domain competence, users foster implicitly active compliance with organizational data governance policies.

Dalam dokumen Big Data Analytics and Cloud Computing (Halaman 39-42)