• Tidak ada hasil yang ditemukan

Machine learning and AI image generation on a local machine with the use of Stable Diffusion

N/A
N/A
Protected

Academic year: 2023

Membagikan "Machine learning and AI image generation on a local machine with the use of Stable Diffusion"

Copied!
34
0
0

Teks penuh

Subject: Submission of Project Report on "Machine Learning and AI Image Generation Locally Using Stable Diffusion". I am writing to formally submit my project report on "Machine Learning and AI Image Generation Locally Using Stable Diffusion". part of my BBA program at United International University. During the duration of this project, I gained considerable technical knowledge and capabilities related to the development of an image generation system using machine learning algorithms, which can be run on a local machine.

It is important to recognize that the successful completion of this project would not have been possible without your valuable advice and support. His open approach in pursuing the project helped me explore the tech world at a higher level to take up a project related to machine learning and AI. I am also grateful to United International University for providing the necessary resources and facilities that enabled me to conduct the research and complete this project.

Finally, I would like to thank the community and researchers whose work I have mentioned and referenced in this project report. Their contributions laid the groundwork for this project and shaped my understanding of the topic. This project report discusses the ability to use Stable Diffusion models to obtain state-of-the-art synthesis results from imaging data and other types of data.

Using stable diffusion and diffusion models on local hardware like this allows adding more information and depth to images, greatly improving the quality of the image.

Introduction

Scope is to generate a Midjourney level image generation process on own personal hardware to learn how machine learning and AI in general work during their generative processing. Setting up stable diffusion in a private secure environment (Personal Computer) - Learn and implement text to picture and picture to picture.

Literature Review

AARON was an advanced computer program with the ability to create complex drawings and images. AARON used a prescribed set of rules and constraints to create his artistic creations, illustrating the ability to improve his performance by self-learning from previous results. Recently, the advent of deep learning has resulted in increasingly realistic results.

The main objective of this effort was to create new pieces of artwork based on the input images. The resulting program, referred to as DeepDream, demonstrated the ability to generate aesthetically pleasing and whimsical visuals based on input photos (Image-to-Image). 2020 witnessed a significant advance in Text-to-Text capabilities through the introduction of the third iteration of the Generative Pretrained Transformer (GPT-3) by OpenAI, a privately owned research organization.

The GPT-3 model represents a significant advance in text-to-text models, demonstrating increased versatility and the ability to produce highly coherent text in response to a wide variety of natural language prompts. These models experience the ability to produce superior quality text that resembles human language. The success of GPT-3 has resulted in the emergence of CLIP, a further cutting-edge model developed by OpenAI, with the specific purpose of establishing connections between textual and visual content.

This extensive training allows CLIP to effectively classify images based on each user-supplied label. In addition, it has the ability to produce textual descriptions that accurately represent any given input image, a process commonly called Image-to-Text. OpenAI has launched DALL-E as a result of these developments, making it possible to generate visually appealing images from textual descriptions (Text-to-Image).

The development and training of Stable Diffusion, an open-source Text-to-Image model with performance comparable to DALL-E, was made possible by the artificial intelligence company Stability AI. The utilization of large-scale, unselective Internet data in the training of CLIP has led to its tendency to perpetuate biased and unfair preconceptions that exist throughout culture and society. Several influential papers published in the 2020s demonstrated the outstanding capabilities of Diffusion models, with the ability to compete with Generative Adversarial Networks (GANs) in the field of image generation.

Research Methodology

Additionally, it involves highlighting the perceptually most salient components of a target using a reweighted boundary. If we continue to treat them as separate, each image created with Stable Diffusion AI is likely to be unique. The likelihood of another individual creating the same AI-generated image in stable diffusion is significantly minimal.

The Stable Diffusion checkpoint fusion is a recently developed feature implemented in Stable Diffusion, which allows users to set many combinations using various modeling techniques with the aim of improving the quality of their AI-generated images. Similar to the Stable Diffusion prompt matrix, the Stable Diffusion checkpoint fusion facilitates the production of AI-generated images with a high degree of visual realism, meeting the specific artistic needs of individuals. Although Stable Diffusion models have been extensively trained in various aspects of image production, it is important to recognize that they have certain limitations.

One viable approach to combine two Stable Diffusion checkpoints and perform individual model training is to use one of the existing third-party interfaces. The Stable Diffusion Prompt Matrix is ​​an additional feature incorporated into the Stable Diffusion technology. For the process of effectively merging the cues into one, it is necessary to notify Stable Diffusion of the specific cues you wish to use within the Incentive Matrix.

While both Prompt Matrix and Stable Diffusion Control Point Coupling serve the purpose of integrating the essential components required for image production, it is important to note that they have distinct characteristics. Using the Prompt Matrix facilitates simultaneous testing of different styles by enabling the combination of multiple prompts to enhance the visual presentation of images. The concept of steady diffusion refers to the process of spreading or distributing a substance or entity in a consistent and predictable manner over time.

Stable diffusion models and their various adaptations are highly effective in the generation of innovative visual representations. The Image to Image tool gives users some control over the style of the generated image. To address this concern, a new image generation neural network called ControlNet, which is based on stable diffusion, has been developed.

ControlNet is trained using a pre-existing Robust Diffusion model that has been previously trained on a large dataset comprising billions of images. Two copies are created of the Stable Diffusion model: one of them has locked weights that remain unchanged, while the second copy has trainable weights that can be adjusted during training.

Figure 3-2: From Diffusion Model image generation from noise.
Figure 3-2: From Diffusion Model image generation from noise.

Findings, Implementation & Experiments

Better not to install the latest version as it may not support the most stable build and diffusion models. Once Python is installed to create the new directory for Web-Ui Automatic1111, name it "Stable Diffusion" or another preferred title within any location on a computing device. In the directory path in the search bar, click and type "cmd" and press the enter key to launch the cmd window.

Back to the CMD window and type there will be a newly created folder with files and download may take some time. Downloading model files like LORA [ CITATION min23 \l 1033 ] or XL-SDXL[ CITATION sta23 \l 1033 ] that will generate the ai images can be found in Hugging face of Git Hub. The installation of chosen stable diffusion model is relatively simple and can be done in a few simple steps.

After the download process is complete, continue to locate the Stable Diffusion folder and transfer the .ckpt or .safetensors file to the designated "models" > "Stable Diffusion". After installation, stable distribution with Automic1111 and interface will open after copying the host IP from CMD. Negative Incentive: Please provide a detailed description of the elements or features you wish to exclude from your artwork.

Generate: After configuring all the required parameters, proceed by selecting the designated option to generate your image. This approach incurs a longer processing time, although it results in reduced use of virtual random access memory (vRAM), since each image is generated sequentially after the completion of the previous one. However, it should be noted that this will also necessitate a higher allocation of video random access memory (vRAM) for processing purposes, as it generates all images in the same generation.

CFG Scale: A lower CFG value will cause the model to ignore the given prompt and exhibit a higher degree of creativity, while a higher CFG value will constrain the model to closely match the demand without allowing much freedom . Width and Height: Specify the dimensions of the image in terms of width and height. Save: Stable Diffusion has the ability to automatically save photos to a designated folder known as the 'output' folder.

Submit to extras: Submitting an image to the 'extras' tab results in moving the image. This tab allows the user to resize the image without experiencing a significant loss of information.

Figure 4-1: ControlNet
Figure 4-1: ControlNet

Conclusion

The technique can help human designers and AI models collaborate and leverage each other's skills. Legal and copyright difficulties may arise over ownership and usage rights as AI-generated content becomes more common. In conclusion, Stable Diffusion Automic1111 and models help computer GPUs drive deep machine learning and AI to generate images by using predictive text to identify user input requests for generating images.

What's in a text-to-image prompt: The potential for stable distribution in visual arts education.

Gambar

Figure 3-1: Latent Space
Figure 3-2: From Diffusion Model image generation from noise.
Figure 3-4: From image generation from diffusion noise.
Figure 3-3: From image generation to diffusion noise.
+7

Referensi

Dokumen terkait