Programming Bitcoin - Digital Library Univ STEKOM

The views expressed in this work are those of the author and do not represent the publisher's views. Use of the information and instructions in this work is at your own risk.

Preface

Who Is This Book For?

What Do I Need to Know?

How Is the Book Arranged?

You will write many Python classes and constructs not only to validate transactions/blocks, but also to create your own and extensive transactions. To help your progress, we'll be looking at lots of code and diagrams throughout.

Setting Up

The book is designed this way so that you can do the "fun" part of coding. You can run any “cell” and view the results as if this were an interactive Python shell.

Answers

The unit tests are written for you, but you'll need to write Python code to pass the tests. You will need to edit the relevant file by clicking on a link like the "this test" link in Figure P-3.

Conventions Used in This Book

Using Code Examples

O’Reilly Online Learning

How to Contact Us

Acknowledgments

He also helped me on my seminars and is just a great person to talk to. He is one of those people who make others sharper, and I am grateful for his meaningful friendship.

Finite Fields

Learning Higher-Level Math

As I said, this chapter and the other two may feel a bit disjointed, but I encourage you to persevere. The basics here will not only make understanding Bitcoin much easier, but will also make understanding the Schnorr token much easier.

Finite Field Definition

Defining Finite Sets

Constructing a Finite Field in Python

After setting up Jupyter Notebook (see “Setup” on page xv), you can navigate to code-ch01/Chap‐. For the next exercise, you'll want to open ecc.py by clicking the link in the Exercise 1 box.

Modulo Arithmetic

We can also see this as "wrapping" in the sense that we pass 0 every time we go forward 12 hours.

Figure 1-3. Clock going forward 47 hours

Modulo Arithmetic in Python

Finite Field Addition and Subtraction

Again, for clarity, we use –f to distinguish field subtraction and negation from integer subtraction and negation.

Coding Addition and Subtraction in Python

We need to return an instance of the class that can be conveniently accessed with self.__class__. Note that you could use FieldElement instead of self.__class__, but that would not make the method easily inheritable.

Finite Field Multiplication and Exponentiation

We pass the two initializing arguments, num and self.prime, to the __init__ method we saw earlier. Again, finite fields must be defined so that the operations always result in a number within the field.

Coding Multiplication in Python

No matter what k you choose, as long as it is greater than 0, multiplying the entire set by k will result in the same set as you started with. If the order of the set was a composite number, multiplying the set by one of the divisors would result in a smaller set.

Coding Exponentiation in Python

The answer to exercise 5 is why fields must have prime powers of the elements. It turns out that the exponent does not need to be a member of a finite field for the math to work.

Finite Field Division

This is very counterintuitive, since we generally think of 2/f7 or 7/f5 as fractions, not pretty finite field elements. This is indeed a very good question; to answer this, we will have to use the result of Exercise 7.

Fermat’s Little Theorem

We can reduce the division problem to a multiplication problem, as long as we can figure out what b–1 is. To reduce the cost, we can use the pow function in Python, which performs exponentiation.

Redefining Exponentiation

As a bonus, we can reduce very large exponents at the same time, since ap–1 = 1.

Conclusion

Elliptic Curves

Definition

The only real difference between the elliptic curve and the cubic curve in Figure 2-3 is the y2 term on the left. If it helps, an elliptic curve can be thought of as taking a cubic equation graph (Figure 2-6), flattening the part above the x-axis (Figure 2-7) and then flattening that part below the x- to reflect (Figure 2-8).

Coding Elliptic Curves in Python

Points are equal if and only if they are on the same curve and have the same coordinates. In other words, __init__ will throw an exception when the point is not on the curve.

Point Addition

Two points define a line, so since that line must intersect the curve once more, that third point reflected on the x-axis is the result of point addition. We can calculate point addition quite easily with a formula, but intuitively, the result of point addition can be almost anywhere given two points on the curve.

Figure 2-10. Line intersects at only one point

Math of Point Addition

This is clear since the line passing through A and B will intersect the curve a third time at the same place, regardless of the order. While this does not prove the associativity of dot addition, the visual should at least give you the intuition that it is true.

Coding Point Addition

Consider the case where the two points are additive inverses (that is, they have the same x but different y, resulting in a vertical line).

Deriving the Point Addition Formula

There is a result of the so-called Viet formula, which states that the coefficients must be equal to each other if the roots are equal. But we need to reflect above the x-axis, so the right-hand side must be negated:.

We can derive the slope of the tangent line using some slightly more advanced math: calculus. To get this we need to take the derivative of both sides of the ellipse.

Coding One More Exception

We've covered what elliptic curves are, how they work, and how to do dot addition. We will now combine the concepts from the first two chapters to learn elliptic curve cryptography in Chapter 3.

Elliptic Curve Cryptography

Elliptic Curves over Reals

It turns out that we can use the addition equations over any field, including the finite fields we learned about in Chapter 1. The only difference is that we must use addition/subtraction/multiplication/division as defined in Chapter 1, not .

Elliptic Curves over Finite Fields

About the only pattern is that the curve is symmetrical right around the center, due to the y2 term. What is surprising is that we use the same point summation equations with the addi‐.

Figure 3-2. Elliptic curve over a finite field

Coding Elliptic Curves over Finite Fields

Point Addition over Finite Fields

All equations for elliptic curves operate on finite fields, which puts us on cre‐.

Coding Point Addition over Finite Fields

Scalar Multiplication for Elliptic Curves

Another property of scalar multiplication is that at a certain multiple we reach the point at infinity (remember that the point at infinity is the additive identity or 0). When we combine the fact that scalar multiplication is easy in one direction, but difficult in the other, and the mathematical properties of a group, we have exactly what we need for elliptic curve cryptography.

Figure 3-4. Scalar multiplication results for y 2 = x 3 + 7 over F 223 for point (170,142) Each point is labeled by how many times we’ve added the point

Why Is This Called the Discrete Log Problem?

Scalar Multiplication Redux

An asymmetric problem is one that is easy to calculate in one direction but difficult to reverse. We can find the results shown earlier, but that's because we have a small group.

Mathematical Groups

We will see in "Defining the curve for Bitcoin" on page 58 that when we have numbers that are much larger, the discrete log becomes an intrakta‐.

Identity

Closure

Invertibility

Commutativity

Associativity

Coding Scalar Multiplication

We need to double the point until we are past how big the coefficient can be. If you don't understand bitwise operators, think about representing the coefficient in binary and just adding the point where there are 1's.

Defining the Curve for Bitcoin

There are many cryptographic curves and they have different trade-offs between security and convenience, but the one we are most interested in is the one Bitcoin uses: secp256k1. This means that most numbers below 2256 are in the prime field, and so every point on the curve has x and y coordinates that can each be expressed in 256 bits.

Working with secp256k1

In case we initialize with the point at infinity, we should let x and y pass directly instead of using the S256Field class. We can also define __rmul__ a bit more efficiently, since we know the array order, n.

Public Key Cryptography

Note that the private key is a single 256-bit number and the public key is a coordinate (x,y), where x and y are each 256-bit numbers.

Signing and Verification

Ultimately this will be used in transactions, which will prove that the legitimate owners of the secrets are spending bitcoins.

Inscribing the Target

This is a fingerprint of the message of the shooter's intent, which anyone who verifies the knife. This is the basis of the signature algorithm, and the two numbers in a signature are r and s.

Verification in Depth

You may be wondering why there are two rounds when only one is needed to get a 256-bit number. Two rounds of sha256 do not necessarily prevent all possible attacks, but doing two rounds is a defense against some potential weaknesses.

Verifying a Signature

There is a well-known hash collision attack on SHA-1 called the birthday attack that makes finding collisions much easier. Google found a SHA-1 crash using some modification of a birthday attack and more in 2017.

Programming Signature Verification

So given a public key that is a point on the secp256k1 curve and a signature hash, z, we can verify whether a signature is valid or not.

Signing in Depth

Creating a Signature

This library is for educational purposes only, so please do not use any of the code explained to you here for production purposes. This is an example of a "brain wallet", which is a way to keep the private key in your head without having to memorize anything too hard.

Programming Message Signing

Importance of a Unique k

Another advantage from a testing point of view is that the signature for a given z and the same private key will be the same every time. Additionally, transactions using deterministic k will produce the same transaction each time, since the signature will not change.

Serialization

Uncompressed SEC Format

Computers can sometimes be more efficient if they use the opposite order, or 'little-endian', that is, starting with the small end first. Unfortunately, some serializations in Bitcoin (such as the x and y coordinates of the SEC format) are big-endian, while others (such as the transaction version number in Chapter 5) are little-endian.

Compressed SEC Format

The big advantage of the compressed SEC format is that it takes up only 33 bytes instead of 65 bytes. Find the compressed SEC format for the public key where the private key secrets are located:.

DER Signatures

Because we know that r is a 256-bit integer, r will be at most 32 bytes expressed as bigendian. In Python 3, you can convert a list of numbers to byte equivalents using bytes([some_integer1, some_integer2]).

Base58

Transmitting Your Public Key

The purpose of this loop is to determine how many bytes on the front are 0 bytes. Finally, we add all the zeros we counted in front, because otherwise they wouldn't show up as prefixes.

Address Format

Note that hashlib.sha256(s).digest does sha256 and the wrapper around it does ripemd160.

WIF Format

If the SEC format used for the public key address was compressed, add a suffix of 0x01.

Big- and Little-Endian Redux

Write a function little_endian_to_int that takes Python bytes, interprets those bytes in little-endian, and returns the number. Go to a testnet faucet and send some testnet coins to that address (it must start with m or n, otherwise something is wrong).

Transactions

Transaction Components

We need to know what network this transaction is on in order to fully validate it. This method must be a class method since the serialization will return a new instance of a Tx object.

Version

Inputs

The number of inputs is the next part of the transaction, as highlighted in Figure 5-4. This is the case with the number of entries in a normal transaction, so using variants helps save space.

Varints

Therefore, we need to define exactly which output within a transaction we use, which is captured in the previous transaction index. Note that the previous transaction ID is 32 bytes and the previous transaction index is 4 bytes.

Sequence and Locktime

We have no idea how much is being spent unless we look it up in the blockchain for the transaction(s) we are spending. Furthermore, we don't even know if the transaction unlocks the right box, so to speak, without knowing about the foregoing.

Parsing Script

Each node must verify that this transaction unlocks the right box and that it does not use non-existent bitcoins. Write the inputs that parse part of the parse method in Tx and the parse method of TxIn.

Outputs

UTXO Set

Locktime

Coding Transactions

This is because the fee is an implied amount, as described in the next section.

Transaction Fee

You may wonder why we don't just get the specific output for the transaction and instead get the entire transaction. By getting the entire transaction, we can verify the transaction ID (the hash256 of its contents) and be sure that we do get the transaction we asked for.

Calculating the Fee

Script

Mechanics of Script

Turing completeness in a programming language essentially means that a program has the ability to loop. Ethereum, which has Turing completeness in its smart contract language Solidity, solves this problem by forcing contracts to pay for program execution with something called "gas". An infinite loop will exhaust all the gas in the contract because by definition.

How Script Works

OP_HASH160 does a sha256 followed by ripemd160 to the top element Another very important operation is OP_CHECKSIG (Figure 6-5).

Coding Opcodes

Parsing the Script Fields

There are three specific opcodes for handling such elements: OP_PUSHDATA1, OP_PUSHDATA2, and OP_PUSHDATA4. OP_PUSHDATA4 means that the next 4 bytes contain how many bytes we need to read for the element.

Coding a Script Parser and Serializer

OP_PUSHDATA2 means that the next 2 bytes contain how many bytes we need to read for the element. For anything between 256 bytes and 520 bytes inclusive, we use OP_PUSHDATA2 <2-byte length of the element in little-endian> .

Combining the Script Fields

Coding the Combined Instruction Set

Standard Scripts

To show exactly how this all works, we'll start with one of the original scripts, pay-to-pubkey. Specifying where the bitcoins go is the job of the ScriptPubKey—this is the lockbox that receives the bitcoins.

Figure 6-8. Pay-to-pubkey (p2pk) ScriptSig

Coding Script Evaluation

The combo script will validate if the signature is valid, but will fail if the signature is invalid. ScriptSig will unlock the ScriptPubKey only if the signature is valid for that public key.

Stack Elements Under the Hood

and 175 are OP_CHECKSIG, OP_CHECKSIGVERIFY, OP_CHECKMULTI SIG, and OP_CHECKMULTISIGVERIFY, all of which require the signature hash value, z, from Chapter 3 for signature validation. If the command is not an opcode, it is an element, so we push that element into.

Problems with p2pk

However, the original use cases for p2pk were for IP-to-IP payments and mining. By the way, this IP-IP payment system has been phased out because it is insecure and prone to man-in-the-middle attacks.

Why Did Satoshi Use the Uncompressed SEC Format?

We know from Chapter 4 that secp256k1 public points are 33 bytes in compressed SEC and 65 bytes in uncompressed SEC format. The SEC format is usually encoded in hexadecimal, doubling the length (hex encodes 4 bits per character instead of 8).

Solving the Problems with p2pkh

If ScriptSig provides a public key that does not hash160 with a 20-byte hash in ScriptPubKey, the script will fail OP_EQUAL VERIFY (Figure 6-22). Another error condition is if ScriptSig has a public key that hashes 160 20-byte hash to ScriptPubKey but has an invalid signature.

Figure 6-16. Pay-to-pubkey-hash (p2pkh) ScriptSig

Scripts Can Be Arbitrarily Constructed

ScriptPubKey has the 20-byte hash160 of the public key and not the public key itself. OP_ADD will consume the top two elements of the stack, add them together, and push the sum onto the stack (Figure 6-31).

Utility of Scripts

Think of this ScriptPubKey as a lock with a very weak lock that anyone can break into. Once the UTXO is spent, included in the block and secured by Proof of Work, the coins are locked to another ScriptPubKey and are no longer as easily spent.

SHA-1 Piñata

Transaction Creation and Validation

Validating Transactions

Checking the Spentness of Inputs

Checking the Sum of the Inputs Versus the Sum of the Outputs

If the charge is negative, we know that the output_sum is greater than the input_sum, which is another way of saying that this transaction is trying to create bitcoins from ether. The fee was so negative that the C++ code was tricked into believing that the fee was actually positive by 0.1 BTC.

Checking the Signature

Empty all the ScriptSigs

We take the ScriptPubKey that the input points to and place it in place of the empty ScriptSig (Figure 7-4). Replace the ScriptSig (yellow highlighted field) for one of the inputs with the previous ScriptPubKey.

Append the hash type

We need the z, or signature hash, to enter the evaluate method, and we need to combine ScriptSig and ScriptPubKey. The quadratic hashing problem states that the time required to compute the signature hashes increases quadratically with the number of inputs in a transaction.

Verifying the Entire Transaction

This transaction had 5,569 inputs and 1 output and took many miners over a minute to validate, as the signature hashes for the transaction were expensive to calculate. Segwit (Chapter 13) fixes this with a different way to calculate the signature hash, which is specified in BIP0143.

Creating Transactions

Constructing the Transaction

Since this is more than 0.1 tBTC, we will want to return the rest to ourselves. Although it is generally a bad privacy and security practice to reuse addresses, we will send the bitcoins back to mzx5YhAH9kNHtcN481u6WkjeHjYtVeKVh2 to make it easier to build the transaction.

Making the Transaction

We want to pay enough on a per byte basis so that miners are motivated to include our transaction as soon as possible. The amount must be in satoshis; given that there are satoshis per BTC, we need to multiply and cast to a whole number.

Signing the Transaction

Creating Your Own Transactions on testnet

Once you have an address, you can get some coins from one of the testnet faucets that offer free testnet coins for testing purposes. One input must come from the tap, the other must come from the previous exercise; the output can be your own address.

Pay-to-Script Hash

Bare Multisig

Assuming the signatures are valid, the stack has a single 1 that validates the combined script (Figure 8-8). The stack elements consumed by OP_CHECKMULTISIG are said to be m, m different signatures, n and n different pubkeys.

Figure 8-1 shows what a ScriptPubKey for a 1-of-2 multisig looks like.

Coding OP_CHECKMULTISIG

Problems with Bare Multisig

Nodes keep track of the UTXO array, and a large ScriptPubKey is more expensive to keep track of. The creator of this transaction split the PDF whitepaper into 64-byte chunks, which were then converted into uncompressed invalid public keys.

Pay-to-Script-Hash (p2sh)

If this exact instruction set ends with a 1 on the stack, then the RedeemScript (the top item in Figure 8-9) is parsed and then added to the Script instruction set. When executing this sequence, if the stack is left with a 1, the RedeemScript is inserted into the Script command set.

Coding p2sh

Similarly, we know that the next command is the 20-byte hash value, and the third command is OP_EQUAL, which is what we tested for in the if statement above it. We run OP_HASH160, 20-byte hash push on the stack, and OP_EQUAL as normal.

More Complicated Scripts

The remainder should be 1, which op_verify verifies (OP_VERIFY consumes one element and returns nothing).

Addresses

Replace the ScriptSig of the input we're checking with RedeemScript Step 3: Add the hash type. We have verified one of the two signatures required to unlock this multisig p2sh.

Blocks

Coinbase Transactions

ScriptSig

BIP0034

Coinbase transactions in different blocks are required to have different ScriptSigs and thus different transaction IDs. This rule remains necessary as it would otherwise allow duplicate coinbase transaction IDs across multiple blocks.

Block Headers

For a block, this is similar in the sense that the version field reflects the capabilities of the software that produced the block. Previously, this was used as a way to indicate a single function that was ready to be implemented by the block's miner.

Previous Block

The >> operator is the real bit-shift operator, which discards the rightmost 29 bits, leaving only the top 3 bits. In our case, we first shift right by 4 bits and then check that the rightmost bit is 1.

Merkle Root

Timestamp

Bits

Nonce

Proof-of-Work

On average, 1022 (or 10 trillion trillion) random 256-bit numbers must be generated before such a small number is found. To return to the analogy, the process of finding proof-of-work requires that we pro-.

How a Miner Generates New Hashes

We intentionally print this number as 64 hexadecimal digits to show how small it is in 256-bit terms. In other words, we have to average 1022 in hashes to find one that small.

The Target

Difficulty

Checking That the Proof-of-Work Is Sufficient

Difficulty Adjustment