pax_global_header00006660000000000000000000000064151240242720014511gustar00rootroot0000000000000052 comment=90d381ea80a9b935a64264dc985e5f1732f5c9bf golang-github-zeebo-blake3-0.2.4/000077500000000000000000000000001512402427200165045ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/.gitignore000066400000000000000000000000571512402427200204760ustar00rootroot00000000000000*.pprof *.test *.txt *.out /upstream /go.work golang-github-zeebo-blake3-0.2.4/LICENSE000066400000000000000000000160251512402427200175150ustar00rootroot00000000000000This work is released into the public domain with CC0 1.0. ------------------------------------------------------------------------------- Creative Commons Legal Code CC0 1.0 Universal CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. Statement of Purpose The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; ii. moral rights retained by the original author(s) and/or performer(s); iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; v. rights protecting the extraction, dissemination, use and reuse of data in a Work; vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 4. Limitations and Disclaimers. a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. golang-github-zeebo-blake3-0.2.4/Makefile000066400000000000000000000016431512402427200201500ustar00rootroot00000000000000asm: internal/alg/hash/hash_avx2/impl_amd64.s internal/alg/compress/compress_sse41/impl_amd64.s internal/alg/hash/hash_avx2/impl_amd64.s: avo/avx2/*.go ( cd avo; go run ./avx2 ) > internal/alg/hash/hash_avx2/impl_amd64.s internal/alg/compress/compress_sse41/impl_amd64.s: avo/sse41/*.go ( cd avo; go run ./sse41 ) > internal/alg/compress/compress_sse41/impl_amd64.s .PHONY: fmt fmt: go fmt ./... .PHONY: clean clean: rm -f internal/alg/hash/hash_avx2/impl_amd64.s rm -f internal/alg/compress/compress_sse41/impl_amd64.s .PHONY: test test: go test -race -bench=. -benchtime=1x .PHONY: vet vet: go tool dist list \ | sed -e 's#/# #g' \ | while read goos goarch; \ do \ echo $$goos $$goarch; \ GOOS=$$goos GOARCH=$$goarch CGO_ENABLED=1 GO386=softfloat go vet ./...; \ GOOS=$$goos GOARCH=$$goarch CGO_ENABLED=1 GO386=softfloat go vet -tags=purego ./...; \ done golang-github-zeebo-blake3-0.2.4/README.md000066400000000000000000000127241512402427200177710ustar00rootroot00000000000000# BLAKE3

go.dev Go Report Card SourceGraph

Pure Go implementation of [BLAKE3](https://blake3.io) with AVX2 and SSE4.1 acceleration. Special thanks to the excellent [avo](https://github.com/mmcloughlin/avo) making writing vectorized version much easier. # Benchmarks ## Caveats This library makes some different design decisions than the upstream Rust crate around internal buffering. Specifically, because it does not target the embedded system space, nor does it support multithreading, it elects to do its own internal buffering. This means that a user does not have to worry about providing large enough buffers to get the best possible performance, but it does worse on smaller input sizes. So some notes: - The Rust benchmarks below are all single-threaded to match this Go implementation. - I make no attempt to get precise measurements (cpu throttling, noisy environment, etc.) so please benchmark on your own systems. - These benchmarks are run on an i7-6700K which does not support AVX-512, so Rust is limited to use AVX2 at sizes above 8 kib. - I tried my best to make them benchmark the same thing, but who knows? :smile: ## Charts In this case, both libraries are able to avoid a lot of data copying and will use vectorized instructions to hash as fast as possible, and perform similarly. ![Large Full Buffer](/assets/large-full-buffer.svg) For incremental writes, you must provide the Rust version large enough buffers so that it can use vectorized instructions. This Go library performs consistently regardless of the size being sent into the update function. ![Incremental](/assets/incremental.svg) The downside of internal buffering is most apparent with small sizes as most time is spent initializing the hasher state. In terms of hashing rate, the difference is 3-4x, but in an absolute sense it's ~100ns (see tables below). If you wish to hash a large number of very small strings and you care about those nanoseconds, be sure to use the Reset method to avoid re-initializing the state. ![Small Full Buffer](/assets/small-full-buffer.svg) ## Timing Tables ### Small | Size | Full Buffer | Reset | | Full Buffer Rate | Reset Rate | |--------|-------------|------------|-|------------------|--------------| | 64 b | `205ns` | `86.5ns` | | `312MB/s` | `740MB/s` | | 256 b | `364ns` | `250ns` | | `703MB/s` | `1.03GB/s` | | 512 b | `575ns` | `468ns` | | `892MB/s` | `1.10GB/s` | | 768 b | `795ns` | `682ns` | | `967MB/s` | `1.13GB/s` | ### Large | Size | Incremental | Full Buffer | Reset | | Incremental Rate | Full Buffer Rate | Reset Rate | |----------|-------------|-------------|------------|-|------------------|------------------|--------------| | 1 kib | `1.02µs` | `1.01µs` | `891ns` | | `1.00GB/s` | `1.01GB/s` | `1.15GB/s` | | 2 kib | `2.11µs` | `2.07µs` | `1.95µs` | | `968MB/s` | `990MB/s` | `1.05GB/s` | | 4 kib | `2.28µs` | `2.15µs` | `2.05µs` | | `1.80GB/s` | `1.90GB/s` | `2.00GB/s` | | 8 kib | `2.64µs` | `2.52µs` | `2.44µs` | | `3.11GB/s` | `3.25GB/s` | `3.36GB/s` | | 16 kib | `4.93µs` | `4.54µs` | `4.48µs` | | `3.33GB/s` | `3.61GB/s` | `3.66GB/s` | | 32 kib | `9.41µs` | `8.62µs` | `8.54µs` | | `3.48GB/s` | `3.80GB/s` | `3.84GB/s` | | 64 kib | `18.2µs` | `16.7µs` | `16.6µs` | | `3.59GB/s` | `3.91GB/s` | `3.94GB/s` | | 128 kib | `36.3µs` | `32.9µs` | `33.1µs` | | `3.61GB/s` | `3.99GB/s` | `3.96GB/s` | | 256 kib | `72.5µs` | `65.7µs` | `66.0µs` | | `3.62GB/s` | `3.99GB/s` | `3.97GB/s` | | 512 kib | `145µs` | `131µs` | `132µs` | | `3.60GB/s` | `4.00GB/s` | `3.97GB/s` | | 1024 kib | `290µs` | `262µs` | `262µs` | | `3.62GB/s` | `4.00GB/s` | `4.00GB/s` | ### No ASM | Size | Incremental | Full Buffer | Reset | | Incremental Rate | Full Buffer Rate | Reset Rate | |----------|-------------|-------------|------------|-|------------------|------------------|-------------| | 64 b | `253ns` | `254ns` | `134ns` | | `253MB/s` | `252MB/s` | `478MB/s` | | 256 b | `553ns` | `557ns` | `441ns` | | `463MB/s` | `459MB/s` | `580MB/s` | | 512 b | `948ns` | `953ns` | `841ns` | | `540MB/s` | `538MB/s` | `609MB/s` | | 768 b | `1.38µs` | `1.40µs` | `1.35µs` | | `558MB/s` | `547MB/s` | `570MB/s` | | 1 kib | `1.77µs` | `1.77µs` | `1.70µs` | | `577MB/s` | `580MB/s` | `602MB/s` | | | | | | | | | | | 1024 kib | `880µs` | `883µs` | `878µs` | | `596MB/s` | `595MB/s` | `598MB/s` | The speed caps out at around 1 kib, so most rows have been elided from the presentation. golang-github-zeebo-blake3-0.2.4/api.go000066400000000000000000000101651512402427200176070ustar00rootroot00000000000000// Package blake3 provides an SSE4.1/AVX2 accelerated BLAKE3 implementation. package blake3 import ( "errors" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/blake3/internal/utils" ) // Hasher is a hash.Hash for BLAKE3. type Hasher struct { size int h hasher } // New returns a new Hasher that has a digest size of 32 bytes. // // If you need more or less output bytes than that, use Digest method. func New() *Hasher { return &Hasher{ size: 32, h: hasher{ key: consts.IV, }, } } // NewKeyed returns a new Hasher that uses the 32 byte input key and has // a digest size of 32 bytes. // // If you need more or less output bytes than that, use the Digest method. func NewKeyed(key []byte) (*Hasher, error) { if len(key) != 32 { return nil, errors.New("invalid key size") } h := &Hasher{ size: 32, h: hasher{ flags: consts.Flag_Keyed, }, } utils.KeyFromBytes(key, &h.h.key) return h, nil } // DeriveKey derives a key based on reusable key material of any // length, in the given context. The key will be stored in out, using // all of its current length. // // Context strings must be hardcoded constants, and the recommended // format is "[application] [commit timestamp] [purpose]", e.g., // "example.com 2019-12-25 16:18:03 session tokens v1". func DeriveKey(context string, material []byte, out []byte) { h := NewDeriveKey(context) _, _ = h.Write(material) _, _ = h.Digest().Read(out) } // NewDeriveKey returns a Hasher that is initialized with the context // string. See DeriveKey for details. It has a digest size of 32 bytes. // // If you need more or less output bytes than that, use the Digest method. func NewDeriveKey(context string) *Hasher { // hash the context string and use that instead of IV h := &Hasher{ size: 32, h: hasher{ key: consts.IV, flags: consts.Flag_DeriveKeyContext, }, } var buf [32]byte _, _ = h.WriteString(context) _, _ = h.Digest().Read(buf[:]) h.Reset() utils.KeyFromBytes(buf[:], &h.h.key) h.h.flags = consts.Flag_DeriveKeyMaterial return h } // Write implements part of the hash.Hash interface. It never returns an error. func (h *Hasher) Write(p []byte) (int, error) { h.h.update(p) return len(p), nil } // WriteString is like Write but specialized to strings to avoid allocations. func (h *Hasher) WriteString(p string) (int, error) { h.h.updateString(p) return len(p), nil } // Reset implements part of the hash.Hash interface. It causes the Hasher to // act as if it was newly created. func (h *Hasher) Reset() { h.h.reset() } // Clone returns a new Hasher with the same internal state. // // Modifying the resulting Hasher will not modify the original Hasher, and vice versa. func (h *Hasher) Clone() *Hasher { return &Hasher{size: h.size, h: h.h} } // Size implements part of the hash.Hash interface. It returns the number of // bytes the hash will output in Sum. func (h *Hasher) Size() int { return h.size } // BlockSize implements part of the hash.Hash interface. It returns the most // natural size to write to the Hasher. func (h *Hasher) BlockSize() int { return 64 } // Sum implements part of the hash.Hash interface. It appends the digest of // the Hasher to the provided buffer and returns it. func (h *Hasher) Sum(b []byte) []byte { if top := len(b) + h.size; top <= cap(b) && top >= len(b) { h.h.finalize(b[len(b):top]) return b[:top] } tmp := make([]byte, h.size) h.h.finalize(tmp) return append(b, tmp...) } // Digest takes a snapshot of the hash state and returns an object that can // be used to read and seek through 2^64 bytes of digest output. func (h *Hasher) Digest() *Digest { var d Digest h.h.finalizeDigest(&d) return &d } // Sum256 returns the first 256 bits of the unkeyed digest of the data. func Sum256(data []byte) (sum [32]byte) { out := Sum512(data) copy(sum[:], out[:32]) return sum } // Sum512 returns the first 512 bits of the unkeyed digest of the data. func Sum512(data []byte) (sum [64]byte) { if len(data) <= consts.ChunkLen { var d Digest compressAll(&d, data, 0, consts.IV) _, _ = d.Read(sum[:]) return sum } else { h := hasher{key: consts.IV} h.update(data) h.finalize(sum[:]) return sum } } golang-github-zeebo-blake3-0.2.4/api_example_test.go000066400000000000000000000074501512402427200223640ustar00rootroot00000000000000package blake3_test import ( "bytes" "fmt" "io" "github.com/zeebo/blake3" ) func ExampleNew() { h := blake3.New() h.Write([]byte("some data")) fmt.Printf("%x\n", h.Sum(nil)) //output: // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 } func ExampleNewKeyed() { h1, err := blake3.NewKeyed(bytes.Repeat([]byte("1"), 32)) if err != nil { panic(err) } h2, err := blake3.NewKeyed(bytes.Repeat([]byte("2"), 32)) if err != nil { panic(err) } h1.Write([]byte("some data")) h2.Write([]byte("some data")) fmt.Printf("%x\n", h1.Sum(nil)) fmt.Printf("%x\n", h2.Sum(nil)) //output: // 107c6f88638356d73cdb80f4d56ffe50abcbd9664a80c8ab2b83b1f946ebaba1 // b4be81075bef5a2448158ee5eeddaed897fe44a564c2cb088facbe7824a25073 } func ExampleDeriveKey() { out := make([]byte, 32) // See the documentation for good practices on what the context should be. blake3.DeriveKey( "my-application v0.1.1 session tokens v1", // context []byte("some material to derive key from"), // material out, ) fmt.Printf("%x\n", out) //output: // 98a3333af735f89eb301b56eaf6a77713aa03cdb0057e5b04352a63ea9204add } func ExampleNewDeriveKey() { // See the documentation for good practices on what the context should be. h := blake3.NewDeriveKey("my-application v0.1.1 session tokens v1") h.Write([]byte("some material to derive key from")) fmt.Printf("%x\n", h.Sum(nil)) //output: // 98a3333af735f89eb301b56eaf6a77713aa03cdb0057e5b04352a63ea9204add } func ExampleHasher_Reset() { h := blake3.New() h.Write([]byte("some data")) fmt.Printf("%x\n", h.Sum(nil)) h.Reset() h.Write([]byte("some data")) fmt.Printf("%x\n", h.Sum(nil)) //output: // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 } func ExampleHasher_Digest() { h := blake3.New() h.Write([]byte("some data")) d := h.Digest() out := make([]byte, 64) d.Read(out) fmt.Printf("%x\n", out[0:32]) fmt.Printf("%x\n", out[32:64]) //output: // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 // 1b55688951738e3a7155d6398eb56c6bc35d5bca5f139d98eb7409be51d1be32 } func ExampleHasher_Clone() { h1 := blake3.New() h1.WriteString("some") h2 := h1.Clone() fmt.Println("before:") fmt.Printf("h1: %x\n", h1.Sum(nil)) fmt.Printf("h2: %x\n\n", h2.Sum(nil)) h2.WriteString(" data") fmt.Println("h2 modified:") fmt.Printf("h1: %x\n", h1.Sum(nil)) fmt.Printf("h2: %x\n\n", h2.Sum(nil)) h1.WriteString(" data") fmt.Println("h1 converged:") fmt.Printf("h1: %x\n", h1.Sum(nil)) fmt.Printf("h2: %x\n", h2.Sum(nil)) //output: // before: // h1: 2f610cf2e7e0dc09384cbaa75b2ae5d9704ac9a5ac7f28684342856e2867c707 // h2: 2f610cf2e7e0dc09384cbaa75b2ae5d9704ac9a5ac7f28684342856e2867c707 // // h2 modified: // h1: 2f610cf2e7e0dc09384cbaa75b2ae5d9704ac9a5ac7f28684342856e2867c707 // h2: b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 // // h1 converged: // h1: b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 // h2: b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 } func ExampleDigest_Seek() { h := blake3.New() h.Write([]byte("some data")) d := h.Digest() out := make([]byte, 32) d.Seek(32, io.SeekStart) d.Read(out) fmt.Printf("%x\n", out) //output: // 1b55688951738e3a7155d6398eb56c6bc35d5bca5f139d98eb7409be51d1be32 } func ExampleSum256() { digest := blake3.Sum256([]byte("some data")) fmt.Printf("%x\n", digest[:]) //output: // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 } func ExampleSum512() { digest := blake3.Sum512([]byte("some data")) fmt.Printf("%x\n", digest[0:32]) fmt.Printf("%x\n", digest[32:64]) //output: // b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2 // 1b55688951738e3a7155d6398eb56c6bc35d5bca5f139d98eb7409be51d1be32 } golang-github-zeebo-blake3-0.2.4/api_test.go000066400000000000000000000155121512402427200206470ustar00rootroot00000000000000package blake3 import ( "bytes" "encoding/hex" "io" "strings" "testing" "github.com/zeebo/assert" ) func TestAPI_Vectors(t *testing.T) { check := func(t *testing.T, h *Hasher, input []byte, hash string) { buf := make([]byte, len(hash)/2) n, err := h.Write(input) assert.NoError(t, err) assert.Equal(t, n, len(input)) n, err = h.Digest().Read(buf) assert.NoError(t, err) assert.Equal(t, n, len(buf)) assert.Equal(t, hash, hex.EncodeToString(buf)) } t.Run("Basic", func(t *testing.T) { for _, tv := range vectors { h := New() check(t, h, tv.input(), tv.hash) } }) t.Run("Keyed", func(t *testing.T) { for _, tv := range vectors { h, err := NewKeyed([]byte(testVectorKey)) assert.NoError(t, err) check(t, h, tv.input(), tv.keyedHash) } }) t.Run("DeriveKey", func(t *testing.T) { for _, tv := range vectors { h := NewDeriveKey(testVectorContext) check(t, h, tv.input(), tv.deriveKey) } }) } func TestAPI(t *testing.T) { key := bytes.Repeat([]byte("a"), 32) context := strings.Repeat("c", 32) cases := []struct { name string new func() (*Hasher, error) data string result string size int }{ { name: "New", new: func() (*Hasher, error) { return New(), nil }, data: "", size: 32, result: "af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262", }, { name: "NewKeyed", new: func() (*Hasher, error) { return NewKeyed(key) }, data: "", size: 32, result: "cbf50f0463d68fd443cdb0826f387a6f57ba6dc4983ba2460fe822552d15d2f4", }, { name: "NewDeriveKey", new: func() (*Hasher, error) { return NewDeriveKey(context), nil }, data: "", size: 32, result: "c5ce1763648ca67eecc8a471f8efccf19dd16178e91d33130d3ae67eadde71cc", }, { name: "New+SmallInput", new: func() (*Hasher, error) { return New(), nil }, data: "some data", size: 32, result: "b224a1da2bf5e72b337dc6dde457a05265a06dec8875be379e2ad2be5edb3bf2", }, { name: "New+LargeInput", new: func() (*Hasher, error) { return New(), nil }, data: strings.Repeat("a", 10240), size: 32, result: "9afd0ba102b2cc68be10ba4d383b3139b97ed36d425b82631a7a1e2424088f7e", }, { name: "New+LargeOutput", new: func() (*Hasher, error) { return New(), nil }, data: "", size: 256, result: "" + "af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262" + "e00f03e7b69af26b7faaf09fcd333050338ddfe085b8cc869ca98b206c08243a" + "26f5487789e8f660afe6c99ef9e0c52b92e7393024a80459cf91f476f9ffdbda" + "7001c22e159b402631f277ca96f2defdf1078282314e763699a31c5363165421" + "cce14d30f8a03e49ee25d2ea3cd48a568957b378a65af65fc35fb3e9e12b81ca" + "2d82cdee16c68908a6772f827564336933c89e6908b2f9c7d1811c0eb795cbd5" + "898fe6f5e8af763319ca863718a59aff3d99660ef642483e217ef0c878582728" + "4fea90d42225e3cdd6a179bee852fd24e7d45b38c27b9c2f9469ea8dbdb893f0", }, } for _, c := range cases { t.Run(c.name, func(t *testing.T) { h, err := c.new() assert.NoError(t, err) n, err := h.Write([]byte(c.data)) assert.NoError(t, err) assert.Equal(t, n, len(c.data)) t.Run("Size", func(t *testing.T) { assert.Equal(t, h.Size(), 32) }) // check that we can sum multiple times, and that it does an append t.Run("Sum", func(t *testing.T) { assert.Equal(t, hex.EncodeToString(h.Sum(nil)), c.result[:64]) for i := 0; i < 64; i++ { assert.Equal(t, hex.EncodeToString(h.Sum(make([]byte, i)[:0])), c.result[:64]) } assert.Equal(t, hex.EncodeToString(h.Sum(make([]byte, 1))), "00"+c.result[:64]) }) // ensure that reset works by issuing the write again t.Run("Reset", func(t *testing.T) { _, _ = h.Write([]byte("some fake wrong data")) h.Reset() n, err := h.Write([]byte(c.data)) assert.NoError(t, err) assert.Equal(t, n, len(c.data)) assert.Equal(t, hex.EncodeToString(h.Sum(nil)), c.result[:64]) }) t.Run("Digest", func(t *testing.T) { t.Run("Read", func(t *testing.T) { // read up to i bytes of output in batches of at most size j for i := 0; i < len(c.result)/2; i++ { for j := 1; j < i; j++ { buf, d := make([]byte, i), h.Digest() for rem := buf; len(rem) > 0; { tmp := rem if len(tmp) > j { tmp = tmp[:j] } n, err := d.Read(tmp) assert.NoError(t, err) assert.Equal(t, n, len(tmp)) rem = rem[n:] } assert.Equal(t, hex.EncodeToString(buf), c.result[:2*i]) } } }) t.Run("SeekStart", func(t *testing.T) { // seek to position i and read the remainder for i := 0; i < len(c.result)/2; i++ { buf, d := make([]byte, len(c.result)/2-i), h.Digest() n64, err := d.Seek(int64(i), io.SeekStart) assert.NoError(t, err) assert.Equal(t, n64, i) n, err := d.Read(buf) assert.NoError(t, err) assert.Equal(t, n, len(buf)) assert.Equal(t, hex.EncodeToString(buf), c.result[2*i:]) } }) t.Run("SeekCurrent", func(t *testing.T) { buf, d := make([]byte, len(c.result)/2), h.Digest() // read then seek backward the amount we just read for i := 0; i < len(c.result)/2; i++ { n, err := d.Read(buf) assert.NoError(t, err) assert.Equal(t, n, len(buf)) assert.Equal(t, hex.EncodeToString(buf[:len(c.result)/2-i]), c.result[2*i:]) n64, err := d.Seek(-int64(n)+1, io.SeekCurrent) assert.NoError(t, err) assert.Equal(t, n64, i+1) } }) }) }) } } func TestAPI_Errors(t *testing.T) { var err error _, err = NewKeyed(make([]byte, 31)) assert.Error(t, err) d := New().Digest() _, err = d.Seek(-1, io.SeekStart) assert.Error(t, err) _, err = d.Seek(-1, io.SeekCurrent) assert.Error(t, err) _, err = d.Seek(0, io.SeekEnd) assert.Error(t, err) _, err = d.Seek(0, 9999) assert.Error(t, err) } func TestSum256(t *testing.T) { h := New() x := make([]byte, 1<<16) for i := range x { x[i] = byte(i) % 251 if i%32 != 0 { continue } h.Reset() _, _ = h.Write(x[:i]) var exp [32]byte _, _ = h.Digest().Read(exp[:]) got := Sum256(x[:i]) assert.Equal(t, hex.EncodeToString(got[:]), hex.EncodeToString(exp[:])) } } func TestSum512(t *testing.T) { h := New() x := make([]byte, 1<<16) for i := range x { x[i] = byte(i) % 251 if i%32 != 0 { continue } h.Reset() _, _ = h.Write(x[:i]) var exp [64]byte _, _ = h.Digest().Read(exp[:]) got := Sum512(x[:i]) assert.Equal(t, hex.EncodeToString(got[:]), hex.EncodeToString(exp[:])) } } func TestClone(t *testing.T) { sum := func(h *Hasher) string { return hex.EncodeToString(h.Sum(nil)) } h1 := New() h1.WriteString("1") h0 := h1.Clone() assert.Equal(t, sum(h1), sum(h0)) h2 := h1.Clone() assert.Equal(t, sum(h1), sum(h2)) h2.WriteString("2") assert.Equal(t, sum(h1), sum(h0)) h1.WriteString("2") assert.Equal(t, sum(h1), sum(h2)) } golang-github-zeebo-blake3-0.2.4/assets/000077500000000000000000000000001512402427200200065ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/assets/incremental.svg000066400000000000000000003305741512402427200230440ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/assets/large-full-buffer.svg000066400000000000000000002555571512402427200240530ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/assets/small-full-buffer.svg000066400000000000000000001671071512402427200240620ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/avo/000077500000000000000000000000001512402427200172715ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/avo/avx2/000077500000000000000000000000001512402427200201515ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/avo/avx2/common.go000066400000000000000000000156611512402427200220010ustar00rootroot00000000000000package main import ( . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" . "github.com/mmcloughlin/avo/reg" . "github.com/zeebo/blake3/avo" ) var msgSched = [7][16]int{ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, {2, 6, 3, 10, 7, 0, 4, 13, 1, 11, 12, 5, 9, 14, 15, 8}, {3, 4, 10, 12, 13, 2, 7, 14, 6, 5, 9, 0, 11, 15, 8, 1}, {10, 7, 12, 9, 14, 3, 13, 15, 4, 0, 11, 2, 5, 8, 1, 6}, {12, 13, 9, 11, 15, 10, 14, 8, 7, 2, 5, 3, 0, 1, 6, 4}, {9, 14, 11, 5, 8, 12, 15, 1, 13, 3, 0, 10, 2, 6, 4, 7}, {11, 15, 5, 0, 1, 9, 8, 6, 14, 10, 2, 12, 3, 4, 7, 13}, } const roundSize = 32 const ( flag_chunkStart = 1 << 0 flag_chunkEnd = 1 << 1 flag_parent = 1 << 2 ) func transpose(c Ctx, alloc *Alloc, vs []*Value) { L01, H01, L23, H23 := alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() L45, H45, L67, H67 := alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() VPUNPCKLDQ(vs[1].GetOp(), vs[0].Get(), L01.Get()) VPUNPCKHDQ(vs[1].ConsumeOp(), vs[0].Consume(), H01.Get()) VPUNPCKLDQ(vs[3].GetOp(), vs[2].Get(), L23.Get()) VPUNPCKHDQ(vs[3].ConsumeOp(), vs[2].Consume(), H23.Get()) VPUNPCKLDQ(vs[5].GetOp(), vs[4].Get(), L45.Get()) VPUNPCKHDQ(vs[5].ConsumeOp(), vs[4].Consume(), H45.Get()) VPUNPCKLDQ(vs[7].GetOp(), vs[6].Get(), L67.Get()) VPUNPCKHDQ(vs[7].ConsumeOp(), vs[6].Consume(), H67.Get()) LL0123, HL0123, LH0123, HH0123 := alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() LL4567, HL4567, LH4567, HH4567 := alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() VPUNPCKLQDQ(L23.GetOp(), L01.Get(), LL0123.Get()) VPUNPCKHQDQ(L23.ConsumeOp(), L01.Consume(), HL0123.Get()) VPUNPCKLQDQ(H23.GetOp(), H01.Get(), LH0123.Get()) VPUNPCKHQDQ(H23.ConsumeOp(), H01.Consume(), HH0123.Get()) VPUNPCKLQDQ(L67.GetOp(), L45.Get(), LL4567.Get()) VPUNPCKHQDQ(L67.ConsumeOp(), L45.Consume(), HL4567.Get()) VPUNPCKLQDQ(H67.GetOp(), H45.Get(), LH4567.Get()) VPUNPCKHQDQ(H67.ConsumeOp(), H45.Consume(), HH4567.Get()) vs[0], vs[1], vs[2], vs[3] = alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() vs[4], vs[5], vs[6], vs[7] = alloc.Value(), alloc.Value(), alloc.Value(), alloc.Value() VINSERTI128(Imm(1), LL4567.Get().(VecPhysical).AsX(), LL0123.Get(), vs[0].Get()) VPERM2I128(Imm(49), LL4567.Consume(), LL0123.Consume(), vs[4].Get()) VINSERTI128(Imm(1), HL4567.Get().(VecPhysical).AsX(), HL0123.Get(), vs[1].Get()) VPERM2I128(Imm(49), HL4567.Consume(), HL0123.Consume(), vs[5].Get()) VINSERTI128(Imm(1), LH4567.Get().(VecPhysical).AsX(), LH0123.Get(), vs[2].Get()) VPERM2I128(Imm(49), LH4567.Consume(), LH0123.Consume(), vs[6].Get()) VINSERTI128(Imm(1), HH4567.Get().(VecPhysical).AsX(), HH0123.Get(), vs[3].Get()) VPERM2I128(Imm(49), HH4567.Consume(), HH0123.Consume(), vs[7].Get()) } func transposeMsg(c Ctx, alloc *Alloc, block GPVirtual, input, msg Mem) { for j := 0; j < 2; j++ { vs := alloc.Values(8) for i, v := range vs { VMOVDQU(input.Offset(1024*i+32*j).Idx(block, 1), v.Get()) } transpose(c, alloc, vs) for i, v := range vs { VMOVDQU(v.Consume(), msg.Offset(32*i+256*j)) } } } func transposeMsgN(c Ctx, alloc *Alloc, block GPVirtual, input, msg Mem, j int) { vs := alloc.Values(8) for i, v := range vs { VMOVDQU(input.Offset(1024*i+32*j).Idx(block, 1), v.Get()) } transpose(c, alloc, vs) for i, v := range vs { VMOVDQU(v.Consume(), msg.Offset(32*i+256*j)) } } func loadCounter(c Ctx, alloc *Alloc, mem, lo_mem, hi_mem Mem) { ctr0, ctr1 := alloc.Value(), alloc.Value() VPBROADCASTQ(mem, ctr0.Get()) VPADDQ(c.Counter, ctr0.Get(), ctr0.Get()) VPBROADCASTQ(mem, ctr1.Get()) VPADDQ(c.Counter.Offset(32), ctr1.Get(), ctr1.Get()) L, H := alloc.Value(), alloc.Value() VPUNPCKLDQ(ctr1.GetOp(), ctr0.Get(), L.Get()) VPUNPCKHDQ(ctr1.ConsumeOp(), ctr0.Consume(), H.Get()) LLH, HLH := alloc.Value(), alloc.Value() VPUNPCKLDQ(H.GetOp(), L.Get(), LLH.Get()) VPUNPCKHDQ(H.ConsumeOp(), L.Consume(), HLH.Get()) ctrl, ctrh := alloc.Value(), alloc.Value() VPERMQ(U8(0b11_01_10_00), LLH.ConsumeOp(), ctrl.Get()) VPERMQ(U8(0b11_01_10_00), HLH.ConsumeOp(), ctrh.Get()) VMOVDQU(ctrl.Consume(), lo_mem) VMOVDQU(ctrh.Consume(), hi_mem) } func finalizeRounds(alloc *Alloc, vs, h_vecs []*Value, h_regs []int) { finalized := [8]bool{} finalize: for j := 0; j < 8; j++ { free := alloc.FreeReg() for i, reg := range h_regs { if reg == free && !finalized[i] { h_vecs[i] = xorb(alloc, vs[i], vs[8+i]) finalized[i] = true continue finalize } } for i, f := range finalized[:] { if !f { h_vecs[i] = xorb(alloc, vs[i], vs[8+i]) finalized[i] = true continue finalize } } } } func round(c Ctx, alloc *Alloc, vs []*Value, r int, m func(n int) Mem) { ms := func(ns ...int) (o []Mem) { for _, n := range ns { o = append(o, m(msgSched[r][n])) } return o } partials := []struct { ms []Mem tab Mem rot int }{ {ms(0, 2, 4, 6), c.Rot16, 12}, {ms(1, 3, 5, 7), c.Rot8, 7}, {ms(8, 10, 12, 14), c.Rot16, 12}, {ms(9, 11, 13, 15), c.Rot8, 7}, } for i, p := range partials { addms(alloc, p.ms, vs[0:4]) tab := alloc.ValueFrom(p.tab) for j := 0; j < 4; j++ { vs[0+j] = add(alloc, vs[4+j], vs[0+j]) vs[12+j] = xor(alloc, vs[0+j], vs[12+j]) vs[12+j] = rotTv(alloc, tab, vs[12+j]) } tab.Free() for j := 0; j < 4; j++ { vs[8+j] = add(alloc, vs[12+j], vs[8+j]) vs[4+j] = xor(alloc, vs[8+j], vs[4+j]) } rotNs(alloc, p.rot, vs[4:8]) // roll the blocks if i == 1 { vs[4], vs[5], vs[6], vs[7] = vs[5], vs[6], vs[7], vs[4] vs[8], vs[9], vs[10], vs[11] = vs[10], vs[11], vs[8], vs[9] vs[12], vs[13], vs[14], vs[15] = vs[15], vs[12], vs[13], vs[14] } else if i == 3 { vs[4], vs[5], vs[6], vs[7] = vs[7], vs[4], vs[5], vs[6] vs[8], vs[9], vs[10], vs[11] = vs[10], vs[11], vs[8], vs[9] vs[12], vs[13], vs[14], vs[15] = vs[13], vs[14], vs[15], vs[12] } } } func addm(alloc *Alloc, mp Mem, a *Value) *Value { o := alloc.Value() VPADDD(mp, a.Consume(), o.Get()) return o } func addms(alloc *Alloc, mps []Mem, as []*Value) { for i, a := range as { as[i] = addm(alloc, mps[i], a) } } func add(alloc *Alloc, a, b *Value) *Value { o := alloc.Value() VPADDD(a.Get(), b.Consume(), o.Get()) return o } func xor(alloc *Alloc, a, b *Value) *Value { o := alloc.Value() VPXOR(a.Get(), b.Consume(), o.Get()) return o } func xorb(alloc *Alloc, a, b *Value) *Value { o := alloc.Value() switch { case a.HasReg(): VPXOR(b.ConsumeOp(), a.Consume(), o.Get()) case b.HasReg(): VPXOR(a.ConsumeOp(), b.Consume(), o.Get()) default: VPXOR(a.ConsumeOp(), b.Consume(), o.Get()) } return o } func rotN(alloc *Alloc, n int, a *Value) *Value { tmp, o := alloc.Value(), alloc.Value() VPSRLD(U8(n), a.Get(), tmp.Get()) VPSLLD(U8(32-n), a.Get(), a.Get()) VPOR(tmp.ConsumeOp(), a.Consume(), o.Get()) return o } func rotNs(alloc *Alloc, n int, as []*Value) { for i, a := range as { as[i] = rotN(alloc, n, a) } } func rotTv(alloc *Alloc, tab, a *Value) *Value { o := alloc.Value() VPSHUFB(tab.GetOp(), a.Consume(), o.Get()) return o } golang-github-zeebo-blake3-0.2.4/avo/avx2/ctx.go000066400000000000000000000024131512402427200212760ustar00rootroot00000000000000package main import ( . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" ) type Ctx struct { Rot16 Mem Rot8 Mem IV Mem BlockLen Mem Zero Mem Counter Mem } func NewCtx() (c Ctx) { c.IV = GLOBL("iv", RODATA|NOPTR) for n, v := range []U32{ 0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19, } { DATA(4*n, v) } c.Rot16 = GLOBL("rot16_shuf", RODATA|NOPTR) for n, v := range []U8{ 0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05, 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D, 0x12, 0x13, 0x10, 0x11, 0x16, 0x17, 0x14, 0x15, 0x1A, 0x1B, 0x18, 0x19, 0x1E, 0x1F, 0x1C, 0x1D, } { DATA(n, v) } c.Rot8 = GLOBL("rot8_shuf", RODATA|NOPTR) for n, v := range []U8{ 0x01, 0x02, 0x03, 0x00, 0x05, 0x06, 0x07, 0x04, 0x09, 0x0A, 0x0B, 0x08, 0x0D, 0x0E, 0x0F, 0x0C, 0x11, 0x12, 0x13, 0x10, 0x15, 0x16, 0x17, 0x14, 0x19, 0x1A, 0x1B, 0x18, 0x1D, 0x1E, 0x1F, 0x1C, } { DATA(n, v) } c.BlockLen = GLOBL("block_len", RODATA|NOPTR) for i := 0; i < 8; i++ { DATA(4*i, U32(64)) } c.Zero = GLOBL("zero", RODATA|NOPTR) for i := 0; i < 8; i++ { DATA(4*i, U32(0)) } c.Counter = GLOBL("counter", RODATA|NOPTR) for i := 0; i < 8; i++ { DATA(8*i, U64(i)) } return c } golang-github-zeebo-blake3-0.2.4/avo/avx2/hashF.go000066400000000000000000000107061512402427200215350ustar00rootroot00000000000000package main import ( . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" . "github.com/mmcloughlin/avo/reg" . "github.com/zeebo/blake3/avo" ) func HashF(c Ctx) { TEXT("HashF", 0, `func( input *[8192]byte, length uint64, counter uint64, flags uint32, key *[8]uint32, out *[32]uint32, chain *[8]uint32, )`) var ( input = Mem{Base: Load(Param("input"), GP64())} length = Load(Param("length"), GP64()).(GPVirtual) counter = Load(Param("counter"), GP64()).(GPVirtual) flags = Load(Param("flags"), GP32()).(GPVirtual) key = Mem{Base: Load(Param("key"), GP64())} out = Mem{Base: Load(Param("out"), GP64())} chain = Mem{Base: Load(Param("chain"), GP64())} ) loop := GP64() chunks := GP64() blocks := GP64() stash := GP64() { Comment("Allocate local space and align it") local := AllocLocal(roundSize + 32) LEAQ(local.Offset(31), stash) // TODO: avo improvement tmp := GP64() MOVQ(U64(31), tmp) NOTQ(tmp) ANDQ(tmp, stash) } alloc := NewAlloc(Mem{Base: stash}) defer alloc.Free() flags_mem := AllocLocal(8) counter_mem := AllocLocal(8) tmp := AllocLocal(32) ctr_lo_mem := AllocLocal(32) ctr_hi_mem := AllocLocal(32) msg := AllocLocal(32 * 16) var ( h_vecs []*Value h_regs []int vs []*Value iv []*Value ctr_low *Value ctr_hi *Value blen_vec *Value flags_vec *Value ) h_vecs = alloc.ValuesWith(8, key) blen_vec = alloc.ValueFrom(c.BlockLen) flags_vec = alloc.ValueWith(flags_mem) iv = alloc.ValuesWith(4, c.IV) ctr_low = alloc.ValueFrom(ctr_lo_mem) ctr_hi = alloc.ValueFrom(ctr_hi_mem) { Comment("Skip if the length is zero") XORQ(chunks, chunks) XORQ(blocks, blocks) TESTQ(length, length) JZ(LabelRef("skip_compute")) } { Comment("Compute complete chunks and blocks") // chunks = (length - 1) / 1024 SUBQ(U8(1), length) MOVQ(length, chunks) SHRQ(U8(10), chunks) // blocks = (length - 1) % 1024 / 64 * 64 MOVQ(length, blocks) ANDQ(U32(960), blocks) } Label("skip_compute") { Comment("Load some params into the stack (avo improvment?)") MOVL(flags, flags_mem) MOVQ(counter, counter_mem) } { Comment("Load IV into vectors") h_regs = make([]int, 8) for i, v := range h_vecs { h_regs[i] = v.Reg() _ = v.Get() } } { Comment("Build and store counter data on the stack") loadCounter(c, alloc, counter_mem, ctr_lo_mem, ctr_hi_mem) } { Comment("Set up block flags and variables for iteration") XORQ(loop, loop) ORL(U8(flag_chunkStart), flags_mem) } Label("loop") { Comment("Include end flags if last block") CMPQ(loop, U32(15*64)) JNE(LabelRef("round_setup")) ORL(U8(flag_chunkEnd), flags_mem) } Label("round_setup") { Comment("Load and transpose message vectors") transposeMsg(c, alloc, loop, input, msg) } { Comment("Load constants for the round") for _, v := range h_vecs { _ = v.Get() } _ = blen_vec.Get() _ = flags_vec.Get() for _, v := range iv { _ = v.Get() } _ = ctr_low.Get() _ = ctr_hi.Get() } { Comment("Save state for partial chunk if necessary") CMPQ(loop, blocks) JNE(LabelRef("begin_rounds")) for i, v := range h_vecs { tmp32 := GP32() VMOVDQU(v.Get(), tmp) MOVL(tmp.Idx(chunks, 4), tmp32) MOVL(tmp32, chain.Offset(4*i)) } } Label("begin_rounds") { Comment("Perform the rounds") vs = []*Value{ h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3], h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7], iv[0], iv[1], iv[2], iv[3], ctr_low, ctr_hi, blen_vec, flags_vec, } for r := 0; r < 7; r++ { Commentf("Round %d", r+1) roundF(c, alloc, vs, r, msg) } } { Comment("Finalize rounds") finalizeRounds(alloc, vs, h_vecs, h_regs) } { Comment("Fix up registers for next iteration") for i := 7; i >= 0; i-- { h_vecs[i].Become(h_regs[i]) } } { Comment("If we have zero complete chunks, we're done") CMPQ(chunks, U8(0)) JNE(LabelRef("loop_trailer")) CMPQ(blocks, loop) JEQ(LabelRef("finalize")) } Label("loop_trailer") { Comment("Increment, reset flags, and loop") CMPQ(loop, U32(15*64)) JEQ(LabelRef("finalize")) ADDQ(Imm(64), loop) MOVL(flags, flags_mem) JMP(LabelRef("loop")) } Label("finalize") { Comment("Store result into out") for i, v := range h_vecs { VMOVDQU(v.Consume(), out.Offset(32*i)) } } VZEROUPPER() RET() } func roundF(c Ctx, alloc *Alloc, vs []*Value, r int, mp Mem) { round(c, alloc, vs, r, func(n int) Mem { return mp.Offset(n * 32) }) } golang-github-zeebo-blake3-0.2.4/avo/avx2/hashP.go000066400000000000000000000040621512402427200215450ustar00rootroot00000000000000package main import ( . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" . "github.com/mmcloughlin/avo/reg" . "github.com/zeebo/blake3/avo" ) func HashP(c Ctx) { TEXT("HashP", NOSPLIT, `func( left *[32]uint32, right *[32]uint32, flags uint8, key *[8]uint32, out *[32]uint32, n int, )`) var ( left = Mem{Base: Load(Param("left"), GP64())} right = Mem{Base: Load(Param("right"), GP64())} flags = Load(Param("flags"), GP32()).(GPVirtual) key = Mem{Base: Load(Param("key"), GP64())} out = Mem{Base: Load(Param("out"), GP64())} ) stash := GP64() { Comment("Allocate local space and align it") local := AllocLocal(roundSize + 32) LEAQ(local.Offset(31), stash) // TODO: avo improvement tmp := GP64() MOVQ(U64(31), tmp) NOTQ(tmp) ANDQ(tmp, stash) } alloc := NewAlloc(Mem{Base: stash}) defer alloc.Free() flags_mem := AllocLocal(8) var ( h_vecs []*Value vs []*Value iv []*Value ctr_low *Value ctr_hi *Value blen_vec *Value flags_vec *Value ) { Comment("Set up flags value") MOVL(flags, flags_mem) } h_vecs = alloc.ValuesWith(8, key) iv = alloc.ValuesWith(4, c.IV) ctr_low = alloc.ValueFrom(c.Zero) ctr_hi = alloc.ValueFrom(c.Zero) blen_vec = alloc.ValueFrom(c.BlockLen) flags_vec = alloc.ValueWith(flags_mem) { Comment("Perform the rounds") vs = []*Value{ h_vecs[0], h_vecs[1], h_vecs[2], h_vecs[3], h_vecs[4], h_vecs[5], h_vecs[6], h_vecs[7], iv[0], iv[1], iv[2], iv[3], ctr_low, ctr_hi, blen_vec, flags_vec, } for r := 0; r < 7; r++ { Commentf("Round %d", r+1) roundP(c, alloc, vs, r, left, right) } } { Comment("Finalize") finalizeRounds(alloc, vs, h_vecs, nil) } { Comment("Store result into out") for i, v := range h_vecs { VMOVDQU(v.Consume(), out.Offset(32*i)) } } VZEROUPPER() RET() } func roundP(c Ctx, alloc *Alloc, vs []*Value, r int, left, right Mem) { round(c, alloc, vs, r, func(n int) Mem { if n < 8 { return left.Offset(n * 32) } else { return right.Offset((n - 8) * 32) } }) } golang-github-zeebo-blake3-0.2.4/avo/avx2/main.go000066400000000000000000000002051512402427200214210ustar00rootroot00000000000000package main import ( "github.com/mmcloughlin/avo/build" ) func main() { c := NewCtx() HashF(c) HashP(c) build.Generate() } golang-github-zeebo-blake3-0.2.4/avo/go.mod000066400000000000000000000001271512402427200203770ustar00rootroot00000000000000module github.com/zeebo/blake3/avo go 1.13 require github.com/mmcloughlin/avo v0.4.0 golang-github-zeebo-blake3-0.2.4/avo/go.sum000066400000000000000000000061371512402427200204330ustar00rootroot00000000000000github.com/mmcloughlin/avo v0.4.0 h1:jeHDRktVD+578ULxWpQHkilor6pkdLF7u7EiTzDbfcU= github.com/mmcloughlin/avo v0.4.0/go.mod h1:RW9BfYA3TgO9uCdNrKU2h6J8cPD8ZLznvfgHAeszb1s= github.com/yuin/goldmark v1.4.0/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k= golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/mod v0.4.2 h1:Gz96sIWK3OalVv/I/qNygP42zyoKp3xptRVCWRFEBvo= golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211030160813-b3129d9d1021 h1:giLT+HuUP/gXYrG2Plg9WTjj4qhfgaW424ZIFog3rlk= golang.org/x/sys v0.0.0-20211030160813-b3129d9d1021/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.1.7 h1:6j8CgantCy3yc8JGBqkDLMKWqZ0RDU2g1HVgacojGWQ= golang.org/x/tools v0.1.7/go.mod h1:LGqMHiF4EqQNHR1JncWGqT5BVaXmza+X+BDGol+dOxo= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4= golang-github-zeebo-blake3-0.2.4/avo/sse41/000077500000000000000000000000001512402427200202305ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/avo/sse41/compress.go000066400000000000000000000111541512402427200224140ustar00rootroot00000000000000package main import ( . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" . "github.com/mmcloughlin/avo/reg" ) func main() { ivMem := GLOBL("iv", RODATA|NOPTR) for n, v := range []U32{ 0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A, 0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19, } { DATA(4*n, v) } rot16Mem := GLOBL("rot16_shuf", RODATA|NOPTR) for n, v := range []U8{ 0x02, 0x03, 0x00, 0x01, 0x06, 0x07, 0x04, 0x05, 0x0A, 0x0B, 0x08, 0x09, 0x0E, 0x0F, 0x0C, 0x0D, 0x12, 0x13, 0x10, 0x11, 0x16, 0x17, 0x14, 0x15, 0x1A, 0x1B, 0x18, 0x19, 0x1E, 0x1F, 0x1C, 0x1D, } { DATA(n, v) } rot8Mem := GLOBL("rot8_shuf", RODATA|NOPTR) for n, v := range []U8{ 0x01, 0x02, 0x03, 0x00, 0x05, 0x06, 0x07, 0x04, 0x09, 0x0A, 0x0B, 0x08, 0x0D, 0x0E, 0x0F, 0x0C, 0x11, 0x12, 0x13, 0x10, 0x15, 0x16, 0x17, 0x14, 0x19, 0x1A, 0x1B, 0x18, 0x1D, 0x1E, 0x1F, 0x1C, } { DATA(n, v) } TEXT("Compress", NOSPLIT, `func( chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32, )`) var ( chain = Mem{Base: Load(Param("chain"), GP64())} block = Mem{Base: Load(Param("block"), GP64())} counter = Load(Param("counter"), GP64()).(GPVirtual) blen = Load(Param("blen"), GP32()).(GPVirtual) flags = Load(Param("flags"), GP32()).(GPVirtual) out = Mem{Base: Load(Param("out"), GP64())} ) rows := []VecVirtual{XMM(), XMM(), XMM(), XMM()} MOVUPS(chain.Offset(0*16), rows[0]) MOVUPS(chain.Offset(1*16), rows[1]) MOVUPS(ivMem, rows[2]) PINSRD(U8(0), counter.As32(), rows[3]) SHRQ(U8(32), counter) PINSRD(U8(1), counter.As32(), rows[3]) PINSRD(U8(2), blen, rows[3]) PINSRD(U8(3), flags, rows[3]) ms := []VecVirtual{XMM(), XMM(), XMM(), XMM()} MOVUPS(block.Offset(0*16), ms[0]) MOVUPS(block.Offset(1*16), ms[1]) MOVUPS(block.Offset(2*16), ms[2]) MOVUPS(block.Offset(3*16), ms[3]) rot16, rot8 := XMM(), XMM() MOVUPS(rot16Mem, rot16) MOVUPS(rot8Mem, rot8) { Comment("round 1") t0 := XMM() MOVAPS(ms[0], t0) // 3 2 1 0 SHUFPS(pack(2, 0, 2, 0), ms[1], t0) // 6 4 2 0 g(rows, t0, rot16, 12) // 6 4 2 0 t1 := XMM() MOVAPS(ms[0], t1) // 3 2 1 0 SHUFPS(pack(3, 1, 3, 1), ms[1], t1) // 7 5 3 1 g(rows, t1, rot8, 7) // 7 5 3 1 diagonalize(rows) t2 := XMM() MOVAPS(ms[2], t2) // b a 9 8 SHUFPS(pack(2, 0, 2, 0), ms[3], t2) // e c a 8 SHUFPS(pack(2, 1, 0, 3), t2, t2) // c a 8 e g(rows, t2, rot16, 12) // c a 8 e t3 := XMM() MOVAPS(ms[2], t3) // b a 9 8 SHUFPS(pack(3, 1, 3, 1), ms[3], t3) // f d b 9 SHUFPS(pack(2, 1, 0, 3), t3, t3) // d b 9 f g(rows, t3, rot8, 7) // d b 9 f undiagonalize(rows) ms[0] = t0 ms[1] = t1 ms[2] = t2 ms[3] = t3 } for i := 1; i < 7; i++ { tt := XMM() Commentf("round %d", i+1) t0 := XMM() MOVAPS(ms[0], t0) SHUFPS(pack(3, 1, 1, 2), ms[1], t0) SHUFPS(pack(0, 3, 2, 1), t0, t0) g(rows, t0, rot16, 12) t1 := XMM() MOVAPS(ms[2], t1) SHUFPS(pack(3, 3, 2, 2), ms[3], t1) PSHUFD(pack(0, 0, 3, 3), ms[0], tt) PBLENDW(U8(0b00110011), tt, t1) g(rows, t1, rot8, 7) diagonalize(rows) t2 := XMM() MOVAPS(ms[3], t2) PUNPCKLLQ(ms[1], t2) PBLENDW(U8(0b11000000), ms[2], t2) SHUFPS(pack(2, 3, 1, 0), t2, t2) g(rows, t2, rot16, 12) t3 := XMM() MOVAPS(ms[1], tt) PUNPCKHLQ(ms[3], tt) MOVAPS(ms[2], t3) PUNPCKLLQ(tt, t3) SHUFPS(pack(0, 1, 3, 2), t3, t3) g(rows, t3, rot8, 7) undiagonalize(rows) ms[0] = t0 ms[1] = t1 ms[2] = t2 ms[3] = t3 } Comment("finalize") PXOR(rows[2], rows[0]) PXOR(rows[3], rows[1]) tmp := XMM() MOVUPS(chain.Offset(0*16), tmp) PXOR(tmp, rows[2]) MOVUPS(chain.Offset(1*16), tmp) PXOR(tmp, rows[3]) MOVUPS(rows[0], out.Offset(0*16)) MOVUPS(rows[1], out.Offset(1*16)) MOVUPS(rows[2], out.Offset(2*16)) MOVUPS(rows[3], out.Offset(3*16)) RET() Generate() } func g(rows []VecVirtual, m VecVirtual, tab VecVirtual, n int) { PADDD(m, rows[0]) PADDD(rows[1], rows[0]) PXOR(rows[0], rows[3]) PSHUFB(tab, rows[3]) PADDD(rows[3], rows[2]) PXOR(rows[2], rows[1]) tmp := XMM() MOVAPS(rows[1], tmp) PSRLL(U8(n), rows[1]) PSLLL(U8(32-n), tmp) POR(tmp, rows[1]) } func pack(a, b, c, d int) U8 { return U8(a<<6 | b<<4 | c<<2 | d) } func diagonalize(rows []VecVirtual) { PSHUFD(pack(2, 1, 0, 3), rows[0], rows[0]) PSHUFD(pack(1, 0, 3, 2), rows[3], rows[3]) PSHUFD(pack(0, 3, 2, 1), rows[2], rows[2]) } func undiagonalize(rows []VecVirtual) { PSHUFD(pack(0, 3, 2, 1), rows[0], rows[0]) PSHUFD(pack(1, 0, 3, 2), rows[3], rows[3]) PSHUFD(pack(2, 1, 0, 3), rows[2], rows[2]) } golang-github-zeebo-blake3-0.2.4/avo/value.go000066400000000000000000000170261512402427200207420ustar00rootroot00000000000000package avo import ( "fmt" "runtime" . "github.com/mmcloughlin/avo/build" . "github.com/mmcloughlin/avo/operand" . "github.com/mmcloughlin/avo/reg" ) var ymmRegs = [...]VecPhysical{ Y0, Y1, Y2, Y3, Y4, Y5, Y6, Y7, Y8, Y9, Y10, Y11, Y12, Y13, Y14, Y15, } // // used set // type used map[int]struct{} func (u used) alloc(max int) (n int, ok bool) { for max == 0 || n < max { if _, ok := u[n]; !ok { u[n] = struct{}{} return n, true } n++ } return 0, false } func (u used) mustAlloc() (n int) { n, ok := u.alloc(0) if !ok { panic("unable to alloc") } return n } func (u used) free(n int) { delete(u, n) } // // alloc // type Alloc struct { m Mem regs used stack used values map[int]*Value ctr int spills int mslot int phys []VecPhysical span int } func NewAlloc(m Mem) *Alloc { return &Alloc{ m: m, regs: make(used), stack: make(used), values: make(map[int]*Value), ctr: 0, spills: 0, mslot: -1, phys: ymmRegs[:], span: 32, } } func (a *Alloc) stats(name, when string) { fmt.Printf("// [%s] %s: %d/16 free (%d total + %d spills + %d slots)\n", name, when, 16-len(a.regs), len(a.values), a.spills, a.mslot+1) } func (a *Alloc) newStateLive(reg int) stateLive { return stateLive{Reg: reg, phys: a.phys} } func (a *Alloc) newStateSpilled(slot int) stateSpilled { return stateSpilled{Slot: slot, mem: a.m, span: a.span, aligned: true} } func (a *Alloc) Debug(name string) func() { a.stats(name, "in") return func() { a.stats(name, "out") } } func (a *Alloc) FreeReg() int { n, ok := a.regs.alloc(16) if !ok { return -1 } a.regs.free(n) return n } func (a *Alloc) Free() { for id, v := range a.values { fmt.Println("leaked value:", id, "==", v.id, "\n", v.stack) } } func (a *Alloc) findOldestLive(except *Value) *Value { var oldest *Value for _, v := range a.values { if oldest == except || !v.state.Live() { continue } if oldest == nil || v.age < oldest.age { oldest = v } } return oldest } func (a *Alloc) allocSpot() valueState { reg, ok := a.regs.alloc(16) if ok { return a.newStateLive(reg) } slot := a.stack.mustAlloc() a.spills++ if slot > a.mslot { a.mslot = slot } return a.newStateSpilled(slot) } func (a *Alloc) allocReg(except *Value) int { reg, ok := a.regs.alloc(16) if ok { return reg } oldest := a.findOldestLive(except) state := oldest.state.(stateLive) oldest.displaceTo(a.allocSpot()) a.regs[state.Reg] = struct{}{} return state.Reg } func (a *Alloc) Value() *Value { a.ctr++ var buf [4096]byte v := &Value{ a: a, id: a.ctr, age: a.ctr, stack: string(buf[:runtime.Stack(buf[:], false)]), reg: -1, state: stateEmpty{}, } a.values[v.id] = v return v } func (a *Alloc) Values(n int) []*Value { out := make([]*Value, n) for i := range out { out[i] = a.Value() } return out } func (a *Alloc) ValueFrom(m Mem) *Value { v := a.Value() v.state = stateLazy{Mem: m} return v } func (a *Alloc) ValuesFrom(n int, m Mem) []*Value { out := make([]*Value, n) for i := range out { out[i] = a.ValueFrom(m.Offset(a.span * i)) } return out } func (a *Alloc) ValueWith(m Mem) *Value { v := a.Value() v.state = stateLazy{Mem: m, Broadcast: true} return v } func (a *Alloc) ValuesWith(n int, m Mem) []*Value { out := make([]*Value, n) for i := range out { out[i] = a.ValueWith(m.Offset(4 * i)) } return out } // // value states // type valueState interface { Op() Op Live() bool String() string ymmState() } type stateBase struct{} func (stateBase) Op() Op { panic("no location for this state") } func (stateBase) Live() bool { return false } func (stateBase) String() string { return "Base" } func (stateBase) ymmState() {} type stateEmpty struct { stateBase } func (stateEmpty) String() string { return "Empty" } type stateLive struct { stateBase Reg int phys []VecPhysical } func (s stateLive) Op() Op { return s.Register() } func (s stateLive) Live() bool { return true } func (s stateLive) String() string { return fmt.Sprintf("Live(%d)", s.Reg) } func (s stateLive) Register() VecPhysical { return s.phys[s.Reg] } type stateSpilled struct { stateBase mem Mem Slot int span int aligned bool } func (s stateSpilled) Op() Op { return s.GetMem() } func (s stateSpilled) String() string { return fmt.Sprintf("Spilled(%d)", s.Slot) } func (s stateSpilled) GetMem() Mem { return s.mem.Offset(s.span * s.Slot) } type stateLazy struct { stateBase Mem Mem Broadcast bool } func (s stateLazy) String() string { return fmt.Sprintf("Lazy(%s, %t)", s.Mem.Asm(), s.Broadcast) } // // value // type Value struct { a *Alloc id int age int stack string reg int // currently allocated register (sometimes dup'd in state) state valueState } func (v *Value) Reg() int { if v.reg < 0 { v.reg = v.a.allocReg(v) } return v.reg } func (v *Value) setState(state valueState) { v.freeSpot() v.state = state v.useSpot() } func (v *Value) Become(reg int) { // if we already are/will be it: done. if v.reg == reg { return } // if it's free: displace to it. if _, ok := v.a.regs[reg]; !ok { v.a.regs[reg] = struct{}{} v.displaceTo(v.a.newStateLive(reg)) return } // someone else owns it. displace them and then displace ourselves. for _, cand := range v.a.values { if cand.reg != reg { continue } state := cand.state cand.displaceTo(cand.a.allocSpot()) v.displaceTo(state) return } } func (v *Value) displaceTo(dest valueState) { if state, ok := dest.(stateSpilled); ok && state.aligned { VMOVDQA(v.Get(), dest.Op()) } else { VMOVDQU(v.Get(), dest.Op()) } v.setState(dest) } func (v *Value) freeSpot() { switch state := v.state.(type) { case stateLive: v.a.regs.free(state.Reg) v.reg = -1 case stateSpilled: v.a.stack.free(state.Slot) } } func (v *Value) useSpot() { switch state := v.state.(type) { case stateLive: v.a.regs[state.Reg] = struct{}{} v.reg = state.Reg case stateSpilled: v.a.stack[state.Slot] = struct{}{} } } func (v *Value) Free() { v.setState(nil) delete(v.a.values, v.id) } func (v *Value) Consume() VecPhysical { reg := v.Get() v.Free() return reg } func (v *Value) ConsumeOp() Op { op := v.GetOp() v.Free() return op } func (v *Value) HasReg() bool { return v.reg >= 0 } func (v *Value) allocReg() int { if v.reg >= 0 { return v.reg } return v.a.allocReg(v) } func (v *Value) Touch() { v.a.ctr++ v.age = v.a.ctr } func (v *Value) GetOp() Op { v.Touch() switch state := v.state.(type) { case stateLive: case stateSpilled: return state.GetMem() case stateLazy: if !state.Broadcast { return state.Mem } reg := v.allocReg() VPBROADCASTD(state.Mem, ymmRegs[reg]) v.setState(v.a.newStateLive(reg)) case stateEmpty: reg := v.allocReg() v.setState(v.a.newStateLive(reg)) } return v.state.(stateLive).Register() } func (v *Value) Get() VecPhysical { v.Touch() switch state := v.state.(type) { case stateLive: case stateSpilled: reg := v.allocReg() if state.aligned { VMOVDQA(state.GetMem(), v.a.phys[reg]) } else { VMOVDQU(state.GetMem(), v.a.phys[reg]) } v.setState(v.a.newStateLive(reg)) case stateLazy: reg := v.allocReg() v.setState(v.a.newStateLive(reg)) if !state.Broadcast { VMOVDQU(state.Mem, v.state.(stateLive).Register()) } else { VPBROADCASTD(state.Mem, v.state.(stateLive).Register()) } case stateEmpty: reg := v.allocReg() v.setState(v.a.newStateLive(reg)) } return v.state.(stateLive).Register() } func (v *Value) String() string { return fmt.Sprintf("Value(reg:%-2d state:%s)", v.reg, v.state) } golang-github-zeebo-blake3-0.2.4/bench/000077500000000000000000000000001512402427200175635ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/bench/table.py000066400000000000000000000056271512402427200212360ustar00rootroot00000000000000def load(path): out = {} for line in open(path): if not line.startswith('BenchmarkBLAKE3'): continue name, _, time, _, rate, _, _, _, _, _ = line.strip().split() _, kind, size = name.split('/') out[(kind, size)] = (int(float(time)), int(float(rate))) return out def scale(ns): if ns < 1000: return float(ns), 'ns' if ns < 1000000: return ns / 1000, 'µs' return ns / 1000000, 'ms' def print_short_row(bytes, size): fb = bench[('Entire', size)] reset = bench[('Reset', size)] fmt = '| {:6} | {:-3} ns | {:-3} ns | | {:-3} MB/s | {:-3} MB/s |' print(fmt.format(bytes, fb[0], reset[0], fb[1], reset[1])) def print_row(bench, row, size): inc = bench[('Incremental', size)] fb = bench[('Entire', size)] reset = bench[('Reset', size)] incr, incs = scale(inc[0]) fbr, fbs = scale(fb[0]) resetr, resets = scale(reset[0]) fmt = '| {:8} | {:-5.4} {} | {:-5.4} {} | {:-5.4} {} | | {:-4} MB/s | {:-4} MB/s | {:-4} MB/s |' print(fmt.format(row, incr, incs, fbr, fbs, resetr, resets, inc[1], fb[1], reset[1])) bench = load('bench.txt') bench_pure = load('bench-pure.txt') print("### Small") print() print('| Size | Full Buffer | Reset | | Full Buffer Rate | Reset Rate |') print('|--------|-------------|------------|-|------------------|--------------|') print_short_row('64 b', '0001_block') print_short_row('256 b', '0004_block') print_short_row('512 b', '0008_block') print_short_row('768 b', '0012_block') print() print("### Large") print() print('| Size | Incremental | Full Buffer | Reset | | Incremental Rate | Full Buffer Rate | Reset Rate |') print('|----------|-------------|-------------|------------|-|------------------|------------------|--------------|') print_row(bench, '1 kib', '0001_kib') print_row(bench, '2 kib', '0002_kib') print_row(bench, '4 kib', '0004_kib') print_row(bench, '8 kib', '0008_kib') print_row(bench, '16 kib', '0016_kib') print_row(bench, '32 kib', '0032_kib') print_row(bench, '64 kib', '0064_kib') print_row(bench, '128 kib', '0128_kib') print_row(bench, '256 kib', '0256_kib') print_row(bench, '512 kib', '0512_kib') print_row(bench, '1024 kib', '1024_kib') print() print("### No ASM") print() print('| Size | Incremental | Full Buffer | Reset | | Incremental Rate | Full Buffer Rate | Reset Rate |') print('|----------|-------------|-------------|------------|-|------------------|------------------|--------------|') print_row(bench_pure, '64 b', '0001_block') print_row(bench_pure, '256 b', '0004_block') print_row(bench_pure, '512 b', '0008_block') print_row(bench_pure, '768 b', '0012_block') print_row(bench_pure, '1 kib', '0016_block') print('| | | | | | | | |') print_row(bench_pure, '1 mib', '1024_kib') golang-github-zeebo-blake3-0.2.4/bench_test.go000066400000000000000000000060641512402427200211570ustar00rootroot00000000000000package blake3 import ( "fmt" "sync" "testing" "github.com/zeebo/blake3/internal/alg" "github.com/zeebo/blake3/internal/consts" ) func BenchmarkBLAKE3(b *testing.B) { out := make([]byte, 32) buf := make([]byte, 1024*1024+512) pool := sync.Pool{ New: func() interface{} { return new(hasher) }, } runIncr := func(b *testing.B, size int) { buf := buf[:size] b.ReportAllocs() b.SetBytes(int64(len(buf))) b.ResetTimer() for i := 0; i < b.N; i++ { h := new(hasher) t := buf for len(t) >= 1024 { h.update(t[:1024]) t = t[1024:] } if len(t) > 0 { h.update(t) } h.finalize(out) } } runEntire := func(b *testing.B, size int) { buf := buf[:size] b.ReportAllocs() b.SetBytes(int64(len(buf))) b.ResetTimer() for i := 0; i < b.N; i++ { h := new(hasher) h.update(buf) h.finalize(out) } } runReset := func(b *testing.B, size int) { buf := buf[:size] b.ReportAllocs() b.SetBytes(int64(len(buf))) b.ResetTimer() for i := 0; i < b.N; i++ { h := pool.Get().(*hasher) h.reset() h.update(buf) h.finalize(out) pool.Put(h) } } for _, kind := range []struct { name string run func(b *testing.B, size int) }{ {"Incremental", runIncr}, {"Entire", runEntire}, {"Reset", runReset}, } { b.Run(kind.name, func(b *testing.B) { run := kind.run for _, n := range []int{ 1, 4, 8, 12, 16, } { b.Run(fmt.Sprintf("%04d_block", n), func(b *testing.B) { run(b, n*64) }) } for _, n := range []int{ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, } { b.Run(fmt.Sprintf("%04d_kib", n), func(b *testing.B) { run(b, n*1024) }) } for _, n := range []int{ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, } { b.Run(fmt.Sprintf("%04d_kib+512", n), func(b *testing.B) { run(b, n*1024+512) }) } }) } } func BenchmarkHashF_1(b *testing.B) { var input [8192]byte var chain [8]uint32 var out chainVector b.SetBytes(1) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { alg.HashF(&input, 1, 0, 0, &consts.IV, &out, &chain) } } func BenchmarkHashF_1536(b *testing.B) { var input [8192]byte var chain [8]uint32 var out chainVector b.SetBytes(1536) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { alg.HashF(&input, 1536, 0, 0, &consts.IV, &out, &chain) } } func BenchmarkHashF_8K(b *testing.B) { var input [8192]byte var chain [8]uint32 var out chainVector b.SetBytes(8192) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { alg.HashF(&input, 8192, 0, 0, &consts.IV, &out, &chain) } } func BenchmarkHashP(b *testing.B) { var left chainVector var right chainVector var out chainVector for n := 1; n <= 8; n++ { b.Run(fmt.Sprint(n), func(b *testing.B) { b.SetBytes(int64(64 * n)) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { alg.HashP(&left, &right, 0, &consts.IV, &out, n) } }) } } func BenchmarkCompress(b *testing.B) { var c [8]uint32 var m, o [16]uint32 b.SetBytes(64) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { alg.Compress(&c, &m, 0, 0, 0, &o) } } golang-github-zeebo-blake3-0.2.4/blake3.go000066400000000000000000000141761512402427200202050ustar00rootroot00000000000000package blake3 import ( "math/bits" "unsafe" "github.com/zeebo/blake3/internal/alg" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/blake3/internal/utils" ) // // hasher contains state for a blake3 hash // type hasher struct { len uint64 chunks uint64 flags uint32 key [8]uint32 stack cvstack buf [8192]byte } func (a *hasher) reset() { a.len = 0 a.chunks = 0 a.stack.occ = 0 a.stack.lvls = [8]uint8{} a.stack.bufn = 0 } func (a *hasher) update(buf []byte) { // relies on the first two words of a string being the same as a slice a.updateString(*(*string)(unsafe.Pointer(&buf))) } func (a *hasher) updateString(buf string) { var input *[8192]byte for len(buf) > 0 { if a.len == 0 && len(buf) > 8192 { // relies on the data pointer being the first word in the string header input = (*[8192]byte)(*(*unsafe.Pointer)(unsafe.Pointer(&buf))) buf = buf[8192:] } else if a.len < 8192 { n := copy(a.buf[a.len:], buf) a.len += uint64(n) buf = buf[n:] continue } else { input = &a.buf } a.consume(input) a.len = 0 a.chunks += 8 } } func (a *hasher) consume(input *[8192]byte) { var out chainVector var chain [8]uint32 alg.HashF(input, 8192, a.chunks, a.flags, &a.key, &out, &chain) a.stack.pushN(0, &out, 8, a.flags, &a.key) } func (a *hasher) finalize(p []byte) { var d Digest a.finalizeDigest(&d) _, _ = d.Read(p) } func (a *hasher) finalizeDigest(d *Digest) { if a.chunks == 0 && a.len <= consts.ChunkLen { compressAll(d, a.buf[:a.len], a.flags, a.key) return } d.chain = a.key d.flags = a.flags | consts.Flag_ChunkEnd if a.len > 64 { var buf chainVector alg.HashF(&a.buf, a.len, a.chunks, a.flags, &a.key, &buf, &d.chain) if a.len > consts.ChunkLen { complete := (a.len - 1) / consts.ChunkLen a.stack.pushN(0, &buf, int(complete), a.flags, &a.key) a.chunks += complete a.len = uint64(copy(a.buf[:], a.buf[complete*consts.ChunkLen:a.len])) } } if a.len <= 64 { d.flags |= consts.Flag_ChunkStart } d.counter = a.chunks d.blen = uint32(a.len) % 64 base := a.len / 64 * 64 if a.len > 0 && d.blen == 0 { d.blen = 64 base -= 64 } if consts.OptimizeLittleEndian { copy((*[64]byte)(unsafe.Pointer(&d.block[0]))[:], a.buf[base:a.len]) } else { var tmp [64]byte copy(tmp[:], a.buf[base:a.len]) utils.BytesToWords(&tmp, &d.block) } for a.stack.bufn > 0 { a.stack.flush(a.flags, &a.key) } var tmp [16]uint32 for occ := a.stack.occ; occ != 0; occ &= occ - 1 { col := uint(bits.TrailingZeros64(occ)) % 64 alg.Compress(&d.chain, &d.block, d.counter, d.blen, d.flags, &tmp) *(*[8]uint32)(unsafe.Pointer(&d.block[0])) = a.stack.stack[col] *(*[8]uint32)(unsafe.Pointer(&d.block[8])) = *(*[8]uint32)(unsafe.Pointer(&tmp[0])) if occ == a.stack.occ { d.chain = a.key d.counter = 0 d.blen = consts.BlockLen d.flags = a.flags | consts.Flag_Parent } } d.flags |= consts.Flag_Root } // // chain value stack // type chainVector = [64]uint32 type cvstack struct { occ uint64 // which levels in stack are occupied lvls [8]uint8 // what level the buf input was in bufn int // how many pairs are loaded into buf buf [2]chainVector stack [64][8]uint32 } func (a *cvstack) pushN(l uint8, cv *chainVector, n int, flags uint32, key *[8]uint32) { for i := 0; i < n; i++ { a.pushL(l, cv, i) for a.bufn == 8 { a.flush(flags, key) } } } func (a *cvstack) pushL(l uint8, cv *chainVector, n int) { bit := uint64(1) << (l & 63) if a.occ&bit == 0 { readChain(cv, n, &a.stack[l&63]) a.occ ^= bit return } a.lvls[a.bufn&7] = l writeChain(&a.stack[l&63], &a.buf[0], a.bufn) copyChain(cv, n, &a.buf[1], a.bufn) a.bufn++ a.occ ^= bit } func (a *cvstack) flush(flags uint32, key *[8]uint32) { var out chainVector alg.HashP(&a.buf[0], &a.buf[1], flags|consts.Flag_Parent, key, &out, a.bufn) bufn, lvls := a.bufn, a.lvls a.bufn, a.lvls = 0, [8]uint8{} for i := 0; i < bufn; i++ { a.pushL(lvls[i]+1, &out, i) } } // // helpers to deal with reading/writing transposed values // func copyChain(in *chainVector, icol int, out *chainVector, ocol int) { type u = uintptr type p = unsafe.Pointer type a = *uint32 i := p(u(p(in)) + u(icol*4)) o := p(u(p(out)) + u(ocol*4)) *a(p(u(o) + 0*32)) = *a(p(u(i) + 0*32)) *a(p(u(o) + 1*32)) = *a(p(u(i) + 1*32)) *a(p(u(o) + 2*32)) = *a(p(u(i) + 2*32)) *a(p(u(o) + 3*32)) = *a(p(u(i) + 3*32)) *a(p(u(o) + 4*32)) = *a(p(u(i) + 4*32)) *a(p(u(o) + 5*32)) = *a(p(u(i) + 5*32)) *a(p(u(o) + 6*32)) = *a(p(u(i) + 6*32)) *a(p(u(o) + 7*32)) = *a(p(u(i) + 7*32)) } func readChain(in *chainVector, col int, out *[8]uint32) { type u = uintptr type p = unsafe.Pointer type a = *uint32 i := p(u(p(in)) + u(col*4)) out[0] = *a(p(u(i) + 0*32)) out[1] = *a(p(u(i) + 1*32)) out[2] = *a(p(u(i) + 2*32)) out[3] = *a(p(u(i) + 3*32)) out[4] = *a(p(u(i) + 4*32)) out[5] = *a(p(u(i) + 5*32)) out[6] = *a(p(u(i) + 6*32)) out[7] = *a(p(u(i) + 7*32)) } func writeChain(in *[8]uint32, out *chainVector, col int) { type u = uintptr type p = unsafe.Pointer type a = *uint32 o := p(u(p(out)) + u(col*4)) *a(p(u(o) + 0*32)) = in[0] *a(p(u(o) + 1*32)) = in[1] *a(p(u(o) + 2*32)) = in[2] *a(p(u(o) + 3*32)) = in[3] *a(p(u(o) + 4*32)) = in[4] *a(p(u(o) + 5*32)) = in[5] *a(p(u(o) + 6*32)) = in[6] *a(p(u(o) + 7*32)) = in[7] } // // compress <= chunkLen bytes in one shot // func compressAll(d *Digest, in []byte, flags uint32, key [8]uint32) { var compressed [16]uint32 d.chain = key d.flags = flags | consts.Flag_ChunkStart for len(in) > 64 { buf := (*[64]byte)(unsafe.Pointer(&in[0])) var block *[16]uint32 if consts.OptimizeLittleEndian { block = (*[16]uint32)(unsafe.Pointer(buf)) } else { block = &d.block utils.BytesToWords(buf, block) } alg.Compress(&d.chain, block, 0, consts.BlockLen, d.flags, &compressed) d.chain = *(*[8]uint32)(unsafe.Pointer(&compressed[0])) d.flags &^= consts.Flag_ChunkStart in = in[64:] } if consts.OptimizeLittleEndian { copy((*[64]byte)(unsafe.Pointer(&d.block[0]))[:], in) } else { var tmp [64]byte copy(tmp[:], in) utils.BytesToWords(&tmp, &d.block) } d.blen = uint32(len(in)) d.flags |= consts.Flag_ChunkEnd | consts.Flag_Root } golang-github-zeebo-blake3-0.2.4/blake3_test.go000066400000000000000000000041011512402427200212270ustar00rootroot00000000000000package blake3 import ( "encoding/hex" "testing" "github.com/zeebo/assert" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/blake3/internal/utils" ) func TestHasher_Vectors(t *testing.T) { check := func(t *testing.T, h hasher, input []byte, hash string) { // ensure reset works h.update(input[:len(input)/2]) h.reset() // write and finalize a bunch for i := range input { var tmp [32]byte h.update(input[i : i+1]) switch i % 8193 { case 0, 1, 2: h.finalize(tmp[:]) default: } } // check every output length requested for i := 0; i <= len(hash)/2; i++ { buf := make([]byte, i) h.finalize(buf) assert.Equal(t, hash[:2*i], hex.EncodeToString(buf)) } // one more reset, full write, full read h.reset() h.update(input) buf := make([]byte, len(hash)/2) h.finalize(buf) assert.Equal(t, hash, hex.EncodeToString(buf)) } t.Run("Basic", func(t *testing.T) { for _, tv := range vectors { h := hasher{key: consts.IV} check(t, h, tv.input(), tv.hash) } }) t.Run("Keyed", func(t *testing.T) { for _, tv := range vectors { h := hasher{flags: consts.Flag_Keyed} utils.KeyFromBytes([]byte(testVectorKey), &h.key) check(t, h, tv.input(), tv.keyedHash) } }) t.Run("DeriveKey", func(t *testing.T) { var buf [32]byte for _, tv := range vectors { h := hasher{flags: consts.Flag_DeriveKeyContext, key: consts.IV} h.updateString(testVectorContext) h.finalize(buf[:]) h.reset() h.flags = consts.Flag_DeriveKeyMaterial utils.KeyFromBytes(buf[:], &h.key) check(t, h, tv.input(), tv.deriveKey) } }) } func TestHasherAlignment(t *testing.T) { // On little endian architectures, we can do unaligned accesses of // uint32 values during the hashing. This test is designed to cause // those unaligned accesses to occur. var buf [32]byte x := make([]byte, 8194) for i := range x { x[i] = byte(i) % 251 } h := hasher{key: consts.IV} h.update(x[1:]) h.finalize(buf[:]) assert.Equal(t, "981d32ed7aad9e408c5c36f6346c915ba11c2bd8b3e7d44902a11d7a141abdd9", hex.EncodeToString(buf[:])) } golang-github-zeebo-blake3-0.2.4/digest.go000066400000000000000000000042021512402427200203100ustar00rootroot00000000000000package blake3 import ( "fmt" "io" "unsafe" "github.com/zeebo/blake3/internal/alg" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/blake3/internal/utils" ) // Digest captures the state of a Hasher allowing reading and seeking through // the output stream. type Digest struct { counter uint64 chain [8]uint32 block [16]uint32 blen uint32 flags uint32 buf [16]uint32 bufn int } // Read reads data from the hasher into out. It always fills the entire buffer and // never errors. The stream will wrap around when reading past 2^64 bytes. func (d *Digest) Read(p []byte) (n int, err error) { n = len(p) if d.bufn > 0 { n := d.slowCopy(p) p = p[n:] d.bufn -= n } for len(p) >= 64 { d.fillBuf() if consts.OptimizeLittleEndian { *(*[64]byte)(unsafe.Pointer(&p[0])) = *(*[64]byte)(unsafe.Pointer(&d.buf[0])) } else { utils.WordsToBytes(&d.buf, p) } p = p[64:] d.bufn = 0 } if len(p) == 0 { return n, nil } d.fillBuf() d.bufn -= d.slowCopy(p) return n, nil } // Seek sets the position to the provided location. Only SeekStart and // SeekCurrent are allowed. func (d *Digest) Seek(offset int64, whence int) (int64, error) { switch whence { case io.SeekStart: case io.SeekEnd: return 0, fmt.Errorf("seek from end not supported") case io.SeekCurrent: offset += int64(consts.BlockLen*d.counter) - int64(d.bufn) default: return 0, fmt.Errorf("invalid whence: %d", whence) } if offset < 0 { return 0, fmt.Errorf("seek before start") } d.setPosition(uint64(offset)) return offset, nil } func (d *Digest) setPosition(pos uint64) { d.counter = pos / consts.BlockLen d.fillBuf() d.bufn -= int(pos % consts.BlockLen) } func (d *Digest) slowCopy(p []byte) (n int) { off := uint(consts.BlockLen-d.bufn) % consts.BlockLen if consts.OptimizeLittleEndian { n = copy(p, (*[consts.BlockLen]byte)(unsafe.Pointer(&d.buf[0]))[off:]) } else { var tmp [consts.BlockLen]byte utils.WordsToBytes(&d.buf, tmp[:]) n = copy(p, tmp[off:]) } return n } func (d *Digest) fillBuf() { alg.Compress(&d.chain, &d.block, d.counter, d.blen, d.flags, &d.buf) d.counter++ d.bufn = consts.BlockLen } golang-github-zeebo-blake3-0.2.4/fuzz_test.go000066400000000000000000000007061512402427200210730ustar00rootroot00000000000000package blake3 import ( "math/rand" "testing" ) func FuzzHash(f *testing.F) { f.Fuzz(func(t *testing.T, prog []byte) { l := 0 for _, v := range prog { l += int(v) } data := make([]byte, l) rand.New(rand.NewSource(0)).Read(data) h, b := New(), data for _, v := range prog { h.Write(b[:v]) b = b[v:] } v1 := h.Sum(nil) v2 := Sum256(data) if string(v1) != string(v2[:]) { t.Fatalf("v1: %v, v2: %v", v1, v2) } }) } golang-github-zeebo-blake3-0.2.4/go.mod000066400000000000000000000002311512402427200176060ustar00rootroot00000000000000module github.com/zeebo/blake3 go 1.18 require ( github.com/klauspost/cpuid/v2 v2.0.12 github.com/zeebo/assert v1.1.0 github.com/zeebo/pcg v1.0.1 ) golang-github-zeebo-blake3-0.2.4/go.sum000066400000000000000000000007671512402427200176510ustar00rootroot00000000000000github.com/klauspost/cpuid/v2 v2.0.12 h1:p9dKCg8i4gmOxtv35DvrYoWqYzQrvEVdjQ762Y0OqZE= github.com/klauspost/cpuid/v2 v2.0.12/go.mod h1:g2LTdtYhdyuGPqyWyv7qRAmj1WBqxuObKfj5c0PQa7c= github.com/zeebo/assert v1.1.0 h1:hU1L1vLTHsnO8x8c9KAR5GmM5QscxHg5RNU5z5qbUWY= github.com/zeebo/assert v1.1.0/go.mod h1:Pq9JiuJQpG8JLJdtkwrJESF0Foym2/D9XMU5ciN/wJ0= github.com/zeebo/pcg v1.0.1 h1:lyqfGeWiv4ahac6ttHs+I5hwtH/+1mrhlCtVNQM2kHo= github.com/zeebo/pcg v1.0.1/go.mod h1:09F0S9iiKrwn9rlI5yjLkmrug154/YRW6KnnXVDM/l4= golang-github-zeebo-blake3-0.2.4/internal/000077500000000000000000000000001512402427200203205ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/000077500000000000000000000000001512402427200210635ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/alg.go000066400000000000000000000011521512402427200221540ustar00rootroot00000000000000package alg import ( "github.com/zeebo/blake3/internal/alg/compress" "github.com/zeebo/blake3/internal/alg/hash" ) func HashF(input *[8192]byte, length, counter uint64, flags uint32, key *[8]uint32, out *[64]uint32, chain *[8]uint32) { hash.HashF(input, length, counter, flags, key, out, chain) } func HashP(left, right *[64]uint32, flags uint32, key *[8]uint32, out *[64]uint32, n int) { hash.HashP(left, right, flags, key, out, n) } func Compress(chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32) { compress.Compress(chain, block, counter, blen, flags, out) } golang-github-zeebo-blake3-0.2.4/internal/alg/compress/000077500000000000000000000000001512402427200227165ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress.go000066400000000000000000000007431512402427200251040ustar00rootroot00000000000000package compress import ( "github.com/zeebo/blake3/internal/alg/compress/compress_pure" "github.com/zeebo/blake3/internal/alg/compress/compress_sse41" "github.com/zeebo/blake3/internal/consts" ) func Compress(chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32) { if consts.HasSSE41 { compress_sse41.Compress(chain, block, counter, blen, flags, out) } else { compress_pure.Compress(chain, block, counter, blen, flags, out) } } golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_pure/000077500000000000000000000000001512402427200256045ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_pure/compress.go000066400000000000000000000076621512402427200300010ustar00rootroot00000000000000package compress_pure import ( "math/bits" "github.com/zeebo/blake3/internal/consts" ) func Compress( chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32, ) { *out = [16]uint32{ chain[0], chain[1], chain[2], chain[3], chain[4], chain[5], chain[6], chain[7], consts.IV0, consts.IV1, consts.IV2, consts.IV3, uint32(counter), uint32(counter >> 32), blen, flags, } rcompress(out, block) } func g(a, b, c, d, mx, my uint32) (uint32, uint32, uint32, uint32) { a += b + mx d = bits.RotateLeft32(d^a, -16) c += d b = bits.RotateLeft32(b^c, -12) a += b + my d = bits.RotateLeft32(d^a, -8) c += d b = bits.RotateLeft32(b^c, -7) return a, b, c, d } func rcompress(s *[16]uint32, m *[16]uint32) { const ( a = 10 b = 11 c = 12 d = 13 e = 14 f = 15 ) s0, s1, s2, s3 := s[0+0], s[0+1], s[0+2], s[0+3] s4, s5, s6, s7 := s[0+4], s[0+5], s[0+6], s[0+7] s8, s9, sa, sb := s[8+0], s[8+1], s[8+2], s[8+3] sc, sd, se, sf := s[8+4], s[8+5], s[8+6], s[8+7] s0, s4, s8, sc = g(s0, s4, s8, sc, m[0], m[1]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[2], m[3]) s2, s6, sa, se = g(s2, s6, sa, se, m[4], m[5]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[6], m[7]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[8], m[9]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[a], m[b]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[c], m[d]) s3, s4, s9, se = g(s3, s4, s9, se, m[e], m[f]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[2], m[6]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[3], m[a]) s2, s6, sa, se = g(s2, s6, sa, se, m[7], m[0]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[4], m[d]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[1], m[b]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[c], m[5]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[9], m[e]) s3, s4, s9, se = g(s3, s4, s9, se, m[f], m[8]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[3], m[4]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[a], m[c]) s2, s6, sa, se = g(s2, s6, sa, se, m[d], m[2]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[7], m[e]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[6], m[5]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[9], m[0]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[b], m[f]) s3, s4, s9, se = g(s3, s4, s9, se, m[8], m[1]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[a], m[7]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[c], m[9]) s2, s6, sa, se = g(s2, s6, sa, se, m[e], m[3]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[d], m[f]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[4], m[0]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[b], m[2]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[5], m[8]) s3, s4, s9, se = g(s3, s4, s9, se, m[1], m[6]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[c], m[d]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[9], m[b]) s2, s6, sa, se = g(s2, s6, sa, se, m[f], m[a]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[e], m[8]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[7], m[2]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[5], m[3]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[0], m[1]) s3, s4, s9, se = g(s3, s4, s9, se, m[6], m[4]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[9], m[e]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[b], m[5]) s2, s6, sa, se = g(s2, s6, sa, se, m[8], m[c]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[f], m[1]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[d], m[3]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[0], m[a]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[2], m[6]) s3, s4, s9, se = g(s3, s4, s9, se, m[4], m[7]) s0, s4, s8, sc = g(s0, s4, s8, sc, m[b], m[f]) s1, s5, s9, sd = g(s1, s5, s9, sd, m[5], m[0]) s2, s6, sa, se = g(s2, s6, sa, se, m[1], m[9]) s3, s7, sb, sf = g(s3, s7, sb, sf, m[8], m[6]) s0, s5, sa, sf = g(s0, s5, sa, sf, m[e], m[a]) s1, s6, sb, sc = g(s1, s6, sb, sc, m[2], m[c]) s2, s7, s8, sd = g(s2, s7, s8, sd, m[3], m[4]) s3, s4, s9, se = g(s3, s4, s9, se, m[7], m[d]) s[8+0] = s8 ^ s[0] s[8+1] = s9 ^ s[1] s[8+2] = sa ^ s[2] s[8+3] = sb ^ s[3] s[8+4] = sc ^ s[4] s[8+5] = sd ^ s[5] s[8+6] = se ^ s[6] s[8+7] = sf ^ s[7] s[0] = s0 ^ s8 s[1] = s1 ^ s9 s[2] = s2 ^ sa s[3] = s3 ^ sb s[4] = s4 ^ sc s[5] = s5 ^ sd s[6] = s6 ^ se s[7] = s7 ^ sf } golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_sse41/000077500000000000000000000000001512402427200255705ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_sse41/impl_amd64.s000066400000000000000000000273041512402427200277160ustar00rootroot00000000000000// Code generated by command: go run compress.go. DO NOT EDIT. #include "textflag.h" DATA iv<>+0(SB)/4, $0x6a09e667 DATA iv<>+4(SB)/4, $0xbb67ae85 DATA iv<>+8(SB)/4, $0x3c6ef372 DATA iv<>+12(SB)/4, $0xa54ff53a DATA iv<>+16(SB)/4, $0x510e527f DATA iv<>+20(SB)/4, $0x9b05688c DATA iv<>+24(SB)/4, $0x1f83d9ab DATA iv<>+28(SB)/4, $0x5be0cd19 GLOBL iv<>(SB), RODATA|NOPTR, $32 DATA rot16_shuf<>+0(SB)/1, $0x02 DATA rot16_shuf<>+1(SB)/1, $0x03 DATA rot16_shuf<>+2(SB)/1, $0x00 DATA rot16_shuf<>+3(SB)/1, $0x01 DATA rot16_shuf<>+4(SB)/1, $0x06 DATA rot16_shuf<>+5(SB)/1, $0x07 DATA rot16_shuf<>+6(SB)/1, $0x04 DATA rot16_shuf<>+7(SB)/1, $0x05 DATA rot16_shuf<>+8(SB)/1, $0x0a DATA rot16_shuf<>+9(SB)/1, $0x0b DATA rot16_shuf<>+10(SB)/1, $0x08 DATA rot16_shuf<>+11(SB)/1, $0x09 DATA rot16_shuf<>+12(SB)/1, $0x0e DATA rot16_shuf<>+13(SB)/1, $0x0f DATA rot16_shuf<>+14(SB)/1, $0x0c DATA rot16_shuf<>+15(SB)/1, $0x0d DATA rot16_shuf<>+16(SB)/1, $0x12 DATA rot16_shuf<>+17(SB)/1, $0x13 DATA rot16_shuf<>+18(SB)/1, $0x10 DATA rot16_shuf<>+19(SB)/1, $0x11 DATA rot16_shuf<>+20(SB)/1, $0x16 DATA rot16_shuf<>+21(SB)/1, $0x17 DATA rot16_shuf<>+22(SB)/1, $0x14 DATA rot16_shuf<>+23(SB)/1, $0x15 DATA rot16_shuf<>+24(SB)/1, $0x1a DATA rot16_shuf<>+25(SB)/1, $0x1b DATA rot16_shuf<>+26(SB)/1, $0x18 DATA rot16_shuf<>+27(SB)/1, $0x19 DATA rot16_shuf<>+28(SB)/1, $0x1e DATA rot16_shuf<>+29(SB)/1, $0x1f DATA rot16_shuf<>+30(SB)/1, $0x1c DATA rot16_shuf<>+31(SB)/1, $0x1d GLOBL rot16_shuf<>(SB), RODATA|NOPTR, $32 DATA rot8_shuf<>+0(SB)/1, $0x01 DATA rot8_shuf<>+1(SB)/1, $0x02 DATA rot8_shuf<>+2(SB)/1, $0x03 DATA rot8_shuf<>+3(SB)/1, $0x00 DATA rot8_shuf<>+4(SB)/1, $0x05 DATA rot8_shuf<>+5(SB)/1, $0x06 DATA rot8_shuf<>+6(SB)/1, $0x07 DATA rot8_shuf<>+7(SB)/1, $0x04 DATA rot8_shuf<>+8(SB)/1, $0x09 DATA rot8_shuf<>+9(SB)/1, $0x0a DATA rot8_shuf<>+10(SB)/1, $0x0b DATA rot8_shuf<>+11(SB)/1, $0x08 DATA rot8_shuf<>+12(SB)/1, $0x0d DATA rot8_shuf<>+13(SB)/1, $0x0e DATA rot8_shuf<>+14(SB)/1, $0x0f DATA rot8_shuf<>+15(SB)/1, $0x0c DATA rot8_shuf<>+16(SB)/1, $0x11 DATA rot8_shuf<>+17(SB)/1, $0x12 DATA rot8_shuf<>+18(SB)/1, $0x13 DATA rot8_shuf<>+19(SB)/1, $0x10 DATA rot8_shuf<>+20(SB)/1, $0x15 DATA rot8_shuf<>+21(SB)/1, $0x16 DATA rot8_shuf<>+22(SB)/1, $0x17 DATA rot8_shuf<>+23(SB)/1, $0x14 DATA rot8_shuf<>+24(SB)/1, $0x19 DATA rot8_shuf<>+25(SB)/1, $0x1a DATA rot8_shuf<>+26(SB)/1, $0x1b DATA rot8_shuf<>+27(SB)/1, $0x18 DATA rot8_shuf<>+28(SB)/1, $0x1d DATA rot8_shuf<>+29(SB)/1, $0x1e DATA rot8_shuf<>+30(SB)/1, $0x1f DATA rot8_shuf<>+31(SB)/1, $0x1c GLOBL rot8_shuf<>(SB), RODATA|NOPTR, $32 // func Compress(chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32) // Requires: SSE, SSE2, SSE4.1, SSSE3 TEXT ·Compress(SB), NOSPLIT, $0-40 MOVQ chain+0(FP), AX MOVQ block+8(FP), CX MOVQ counter+16(FP), DX MOVL blen+24(FP), BX MOVL flags+28(FP), SI MOVQ out+32(FP), DI MOVUPS (AX), X0 MOVUPS 16(AX), X1 MOVUPS iv<>+0(SB), X2 PINSRD $0x00, DX, X3 SHRQ $0x20, DX PINSRD $0x01, DX, X3 PINSRD $0x02, BX, X3 PINSRD $0x03, SI, X3 MOVUPS (CX), X4 MOVUPS 16(CX), X5 MOVUPS 32(CX), X6 MOVUPS 48(CX), X7 MOVUPS rot16_shuf<>+0(SB), X8 MOVUPS rot8_shuf<>+0(SB), X9 // round 1 MOVAPS X4, X10 SHUFPS $0x88, X5, X10 PADDD X10, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X11 PSRLL $0x0c, X1 PSLLL $0x14, X11 POR X11, X1 MOVAPS X4, X4 SHUFPS $0xdd, X5, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X6, X5 SHUFPS $0x88, X7, X5 SHUFPS $0x93, X5, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X11 PSRLL $0x0c, X1 PSLLL $0x14, X11 POR X11, X1 MOVAPS X6, X6 SHUFPS $0xdd, X7, X6 SHUFPS $0x93, X6, X6 PADDD X6, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x07, X1 PSLLL $0x19, X7 POR X7, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 2 MOVAPS X10, X7 SHUFPS $0xd6, X4, X7 SHUFPS $0x39, X7, X7 PADDD X7, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X11 PSRLL $0x0c, X1 PSLLL $0x14, X11 POR X11, X1 MOVAPS X5, X11 SHUFPS $0xfa, X6, X11 PSHUFD $0x0f, X10, X10 PBLENDW $0x33, X10, X11 PADDD X11, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X10 PSRLL $0x07, X1 PSLLL $0x19, X10 POR X10, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X6, X12 PUNPCKLLQ X4, X12 PBLENDW $0xc0, X5, X12 SHUFPS $0xb4, X12, X12 PADDD X12, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X10 PSRLL $0x0c, X1 PSLLL $0x14, X10 POR X10, X1 MOVAPS X4, X10 PUNPCKHLQ X6, X10 MOVAPS X5, X4 PUNPCKLLQ X10, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 3 MOVAPS X7, X5 SHUFPS $0xd6, X11, X5 SHUFPS $0x39, X5, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X6 PSRLL $0x0c, X1 PSLLL $0x14, X6 POR X6, X1 MOVAPS X12, X6 SHUFPS $0xfa, X4, X6 PSHUFD $0x0f, X7, X7 PBLENDW $0x33, X7, X6 PADDD X6, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x07, X1 PSLLL $0x19, X7 POR X7, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X4, X10 PUNPCKLLQ X11, X10 PBLENDW $0xc0, X12, X10 SHUFPS $0xb4, X10, X10 PADDD X10, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x0c, X1 PSLLL $0x14, X7 POR X7, X1 MOVAPS X11, X7 PUNPCKHLQ X4, X7 MOVAPS X12, X4 PUNPCKLLQ X7, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x07, X1 PSLLL $0x19, X7 POR X7, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 4 MOVAPS X5, X7 SHUFPS $0xd6, X6, X7 SHUFPS $0x39, X7, X7 PADDD X7, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X11 PSRLL $0x0c, X1 PSLLL $0x14, X11 POR X11, X1 MOVAPS X10, X11 SHUFPS $0xfa, X4, X11 PSHUFD $0x0f, X5, X5 PBLENDW $0x33, X5, X11 PADDD X11, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X4, X12 PUNPCKLLQ X6, X12 PBLENDW $0xc0, X10, X12 SHUFPS $0xb4, X12, X12 PADDD X12, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x0c, X1 PSLLL $0x14, X5 POR X5, X1 MOVAPS X6, X5 PUNPCKHLQ X4, X5 MOVAPS X10, X4 PUNPCKLLQ X5, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 5 MOVAPS X7, X5 SHUFPS $0xd6, X11, X5 SHUFPS $0x39, X5, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X6 PSRLL $0x0c, X1 PSLLL $0x14, X6 POR X6, X1 MOVAPS X12, X6 SHUFPS $0xfa, X4, X6 PSHUFD $0x0f, X7, X7 PBLENDW $0x33, X7, X6 PADDD X6, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x07, X1 PSLLL $0x19, X7 POR X7, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X4, X10 PUNPCKLLQ X11, X10 PBLENDW $0xc0, X12, X10 SHUFPS $0xb4, X10, X10 PADDD X10, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x0c, X1 PSLLL $0x14, X7 POR X7, X1 MOVAPS X11, X7 PUNPCKHLQ X4, X7 MOVAPS X12, X4 PUNPCKLLQ X7, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X7 PSRLL $0x07, X1 PSLLL $0x19, X7 POR X7, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 6 MOVAPS X5, X7 SHUFPS $0xd6, X6, X7 SHUFPS $0x39, X7, X7 PADDD X7, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X11 PSRLL $0x0c, X1 PSLLL $0x14, X11 POR X11, X1 MOVAPS X10, X11 SHUFPS $0xfa, X4, X11 PSHUFD $0x0f, X5, X5 PBLENDW $0x33, X5, X11 PADDD X11, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X4, X12 PUNPCKLLQ X6, X12 PBLENDW $0xc0, X10, X12 SHUFPS $0xb4, X12, X12 PADDD X12, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x0c, X1 PSLLL $0x14, X5 POR X5, X1 MOVAPS X6, X5 PUNPCKHLQ X4, X5 MOVAPS X10, X4 PUNPCKLLQ X5, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // round 7 MOVAPS X7, X5 SHUFPS $0xd6, X11, X5 SHUFPS $0x39, X5, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x0c, X1 PSLLL $0x14, X5 POR X5, X1 MOVAPS X12, X5 SHUFPS $0xfa, X4, X5 PSHUFD $0x0f, X7, X6 PBLENDW $0x33, X6, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x07, X1 PSLLL $0x19, X5 POR X5, X1 PSHUFD $0x93, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x39, X2, X2 MOVAPS X4, X5 PUNPCKLLQ X11, X5 PBLENDW $0xc0, X12, X5 SHUFPS $0xb4, X5, X5 PADDD X5, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X8, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X5 PSRLL $0x0c, X1 PSLLL $0x14, X5 POR X5, X1 MOVAPS X11, X6 PUNPCKHLQ X4, X6 MOVAPS X12, X4 PUNPCKLLQ X6, X4 SHUFPS $0x1e, X4, X4 PADDD X4, X0 PADDD X1, X0 PXOR X0, X3 PSHUFB X9, X3 PADDD X3, X2 PXOR X2, X1 MOVAPS X1, X4 PSRLL $0x07, X1 PSLLL $0x19, X4 POR X4, X1 PSHUFD $0x39, X0, X0 PSHUFD $0x4e, X3, X3 PSHUFD $0x93, X2, X2 // finalize PXOR X2, X0 PXOR X3, X1 MOVUPS (AX), X4 PXOR X4, X2 MOVUPS 16(AX), X4 PXOR X4, X3 MOVUPS X0, (DI) MOVUPS X1, 16(DI) MOVUPS X2, 32(DI) MOVUPS X3, 48(DI) RET golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_sse41/impl_other.go000066400000000000000000000004661512402427200302670ustar00rootroot00000000000000//go:build !amd64 // +build !amd64 package compress_sse41 import "github.com/zeebo/blake3/internal/alg/compress/compress_pure" func Compress(chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32) { compress_pure.Compress(chain, block, counter, blen, flags, out) } golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_sse41/impl_test.go000066400000000000000000000014351512402427200301220ustar00rootroot00000000000000package compress_sse41_test import ( "testing" "github.com/zeebo/assert" "github.com/zeebo/blake3/internal/alg/compress/compress_pure" "github.com/zeebo/blake3/internal/alg/compress/compress_sse41" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/pcg" ) func TestCompress(t *testing.T) { if !consts.HasSSE41 { t.SkipNow() } var chain [8]uint32 var block [16]uint32 for i := 0; i < 1e5; i++ { var o1, o2 [16]uint32 counter, blen, flags := pcg.Uint64(), pcg.Uint32(), pcg.Uint32() for i := range &chain { chain[i] = pcg.Uint32() } for i := range &block { block[i] = pcg.Uint32() } compress_sse41.Compress(&chain, &block, counter, blen, flags, &o1) compress_pure.Compress(&chain, &block, counter, blen, flags, &o2) assert.Equal(t, o1, o2) } } golang-github-zeebo-blake3-0.2.4/internal/alg/compress/compress_sse41/stubs.go000066400000000000000000000002671512402427200272640ustar00rootroot00000000000000//go:build amd64 // +build amd64 package compress_sse41 //go:noescape func Compress(chain *[8]uint32, block *[16]uint32, counter uint64, blen uint32, flags uint32, out *[16]uint32) golang-github-zeebo-blake3-0.2.4/internal/alg/hash/000077500000000000000000000000001512402427200220065ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash.go000066400000000000000000000013431512402427200232610ustar00rootroot00000000000000package hash import ( "github.com/zeebo/blake3/internal/alg/hash/hash_avx2" "github.com/zeebo/blake3/internal/alg/hash/hash_pure" "github.com/zeebo/blake3/internal/consts" ) func HashF(input *[8192]byte, length, counter uint64, flags uint32, key *[8]uint32, out *[64]uint32, chain *[8]uint32) { if consts.HasAVX2 && length > 2*consts.ChunkLen { hash_avx2.HashF(input, length, counter, flags, key, out, chain) } else { hash_pure.HashF(input, length, counter, flags, key, out, chain) } } func HashP(left, right *[64]uint32, flags uint32, key *[8]uint32, out *[64]uint32, n int) { if consts.HasAVX2 && n >= 2 { hash_avx2.HashP(left, right, flags, key, out, n) } else { hash_pure.HashP(left, right, flags, key, out, n) } } golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_avx2/000077500000000000000000000000001512402427200236715ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_avx2/impl_amd64.s000066400000000000000000001674061512402427200260270ustar00rootroot00000000000000// Code generated by command: go run main.go. DO NOT EDIT. #include "textflag.h" DATA iv<>+0(SB)/4, $0x6a09e667 DATA iv<>+4(SB)/4, $0xbb67ae85 DATA iv<>+8(SB)/4, $0x3c6ef372 DATA iv<>+12(SB)/4, $0xa54ff53a DATA iv<>+16(SB)/4, $0x510e527f DATA iv<>+20(SB)/4, $0x9b05688c DATA iv<>+24(SB)/4, $0x1f83d9ab DATA iv<>+28(SB)/4, $0x5be0cd19 GLOBL iv<>(SB), RODATA|NOPTR, $32 DATA rot16_shuf<>+0(SB)/1, $0x02 DATA rot16_shuf<>+1(SB)/1, $0x03 DATA rot16_shuf<>+2(SB)/1, $0x00 DATA rot16_shuf<>+3(SB)/1, $0x01 DATA rot16_shuf<>+4(SB)/1, $0x06 DATA rot16_shuf<>+5(SB)/1, $0x07 DATA rot16_shuf<>+6(SB)/1, $0x04 DATA rot16_shuf<>+7(SB)/1, $0x05 DATA rot16_shuf<>+8(SB)/1, $0x0a DATA rot16_shuf<>+9(SB)/1, $0x0b DATA rot16_shuf<>+10(SB)/1, $0x08 DATA rot16_shuf<>+11(SB)/1, $0x09 DATA rot16_shuf<>+12(SB)/1, $0x0e DATA rot16_shuf<>+13(SB)/1, $0x0f DATA rot16_shuf<>+14(SB)/1, $0x0c DATA rot16_shuf<>+15(SB)/1, $0x0d DATA rot16_shuf<>+16(SB)/1, $0x12 DATA rot16_shuf<>+17(SB)/1, $0x13 DATA rot16_shuf<>+18(SB)/1, $0x10 DATA rot16_shuf<>+19(SB)/1, $0x11 DATA rot16_shuf<>+20(SB)/1, $0x16 DATA rot16_shuf<>+21(SB)/1, $0x17 DATA rot16_shuf<>+22(SB)/1, $0x14 DATA rot16_shuf<>+23(SB)/1, $0x15 DATA rot16_shuf<>+24(SB)/1, $0x1a DATA rot16_shuf<>+25(SB)/1, $0x1b DATA rot16_shuf<>+26(SB)/1, $0x18 DATA rot16_shuf<>+27(SB)/1, $0x19 DATA rot16_shuf<>+28(SB)/1, $0x1e DATA rot16_shuf<>+29(SB)/1, $0x1f DATA rot16_shuf<>+30(SB)/1, $0x1c DATA rot16_shuf<>+31(SB)/1, $0x1d GLOBL rot16_shuf<>(SB), RODATA|NOPTR, $32 DATA rot8_shuf<>+0(SB)/1, $0x01 DATA rot8_shuf<>+1(SB)/1, $0x02 DATA rot8_shuf<>+2(SB)/1, $0x03 DATA rot8_shuf<>+3(SB)/1, $0x00 DATA rot8_shuf<>+4(SB)/1, $0x05 DATA rot8_shuf<>+5(SB)/1, $0x06 DATA rot8_shuf<>+6(SB)/1, $0x07 DATA rot8_shuf<>+7(SB)/1, $0x04 DATA rot8_shuf<>+8(SB)/1, $0x09 DATA rot8_shuf<>+9(SB)/1, $0x0a DATA rot8_shuf<>+10(SB)/1, $0x0b DATA rot8_shuf<>+11(SB)/1, $0x08 DATA rot8_shuf<>+12(SB)/1, $0x0d DATA rot8_shuf<>+13(SB)/1, $0x0e DATA rot8_shuf<>+14(SB)/1, $0x0f DATA rot8_shuf<>+15(SB)/1, $0x0c DATA rot8_shuf<>+16(SB)/1, $0x11 DATA rot8_shuf<>+17(SB)/1, $0x12 DATA rot8_shuf<>+18(SB)/1, $0x13 DATA rot8_shuf<>+19(SB)/1, $0x10 DATA rot8_shuf<>+20(SB)/1, $0x15 DATA rot8_shuf<>+21(SB)/1, $0x16 DATA rot8_shuf<>+22(SB)/1, $0x17 DATA rot8_shuf<>+23(SB)/1, $0x14 DATA rot8_shuf<>+24(SB)/1, $0x19 DATA rot8_shuf<>+25(SB)/1, $0x1a DATA rot8_shuf<>+26(SB)/1, $0x1b DATA rot8_shuf<>+27(SB)/1, $0x18 DATA rot8_shuf<>+28(SB)/1, $0x1d DATA rot8_shuf<>+29(SB)/1, $0x1e DATA rot8_shuf<>+30(SB)/1, $0x1f DATA rot8_shuf<>+31(SB)/1, $0x1c GLOBL rot8_shuf<>(SB), RODATA|NOPTR, $32 DATA block_len<>+0(SB)/4, $0x00000040 DATA block_len<>+4(SB)/4, $0x00000040 DATA block_len<>+8(SB)/4, $0x00000040 DATA block_len<>+12(SB)/4, $0x00000040 DATA block_len<>+16(SB)/4, $0x00000040 DATA block_len<>+20(SB)/4, $0x00000040 DATA block_len<>+24(SB)/4, $0x00000040 DATA block_len<>+28(SB)/4, $0x00000040 GLOBL block_len<>(SB), RODATA|NOPTR, $32 DATA zero<>+0(SB)/4, $0x00000000 DATA zero<>+4(SB)/4, $0x00000000 DATA zero<>+8(SB)/4, $0x00000000 DATA zero<>+12(SB)/4, $0x00000000 DATA zero<>+16(SB)/4, $0x00000000 DATA zero<>+20(SB)/4, $0x00000000 DATA zero<>+24(SB)/4, $0x00000000 DATA zero<>+28(SB)/4, $0x00000000 GLOBL zero<>(SB), RODATA|NOPTR, $32 DATA counter<>+0(SB)/8, $0x0000000000000000 DATA counter<>+8(SB)/8, $0x0000000000000001 DATA counter<>+16(SB)/8, $0x0000000000000002 DATA counter<>+24(SB)/8, $0x0000000000000003 DATA counter<>+32(SB)/8, $0x0000000000000004 DATA counter<>+40(SB)/8, $0x0000000000000005 DATA counter<>+48(SB)/8, $0x0000000000000006 DATA counter<>+56(SB)/8, $0x0000000000000007 GLOBL counter<>(SB), RODATA|NOPTR, $64 // func HashF(input *[8192]byte, length uint64, counter uint64, flags uint32, key *[8]uint32, out *[32]uint32, chain *[8]uint32) // Requires: AVX, AVX2 TEXT ·HashF(SB), $688-56 MOVQ input+0(FP), AX MOVQ length+8(FP), CX MOVQ counter+16(FP), DX MOVL flags+24(FP), BX MOVQ key+32(FP), SI MOVQ out+40(FP), DI MOVQ chain+48(FP), R8 // Allocate local space and align it LEAQ 31(SP), R11 MOVQ $0x000000000000001f, R9 NOTQ R9 ANDQ R9, R11 // Skip if the length is zero XORQ R9, R9 XORQ R10, R10 TESTQ CX, CX JZ skip_compute // Compute complete chunks and blocks SUBQ $0x01, CX MOVQ CX, R9 SHRQ $0x0a, R9 MOVQ CX, R10 ANDQ $0x000003c0, R10 skip_compute: // Load some params into the stack (avo improvment?) MOVL BX, 64(SP) MOVQ DX, 72(SP) // Load IV into vectors VPBROADCASTD (SI), Y0 VPBROADCASTD 4(SI), Y1 VPBROADCASTD 8(SI), Y2 VPBROADCASTD 12(SI), Y3 VPBROADCASTD 16(SI), Y4 VPBROADCASTD 20(SI), Y5 VPBROADCASTD 24(SI), Y6 VPBROADCASTD 28(SI), Y7 // Build and store counter data on the stack VPBROADCASTQ 72(SP), Y8 VPADDQ counter<>+0(SB), Y8, Y8 VPBROADCASTQ 72(SP), Y9 VPADDQ counter<>+32(SB), Y9, Y9 VPUNPCKLDQ Y9, Y8, Y10 VPUNPCKHDQ Y9, Y8, Y8 VPUNPCKLDQ Y8, Y10, Y9 VPUNPCKHDQ Y8, Y10, Y8 VPERMQ $0xd8, Y9, Y9 VPERMQ $0xd8, Y8, Y8 VMOVDQU Y9, 112(SP) VMOVDQU Y8, 144(SP) // Set up block flags and variables for iteration XORQ CX, CX ORL $0x01, 64(SP) loop: // Include end flags if last block CMPQ CX, $0x000003c0 JNE round_setup ORL $0x02, 64(SP) round_setup: // Load and transpose message vectors VMOVDQU (AX)(CX*1), Y8 VMOVDQU 1024(AX)(CX*1), Y9 VMOVDQU 2048(AX)(CX*1), Y10 VMOVDQU 3072(AX)(CX*1), Y11 VMOVDQU 4096(AX)(CX*1), Y12 VMOVDQU 5120(AX)(CX*1), Y13 VMOVDQU 6144(AX)(CX*1), Y14 VMOVDQU 7168(AX)(CX*1), Y15 VMOVDQA Y0, (R11) VPUNPCKLDQ Y9, Y8, Y0 VPUNPCKHDQ Y9, Y8, Y8 VPUNPCKLDQ Y11, Y10, Y9 VPUNPCKHDQ Y11, Y10, Y10 VPUNPCKLDQ Y13, Y12, Y11 VPUNPCKHDQ Y13, Y12, Y12 VPUNPCKLDQ Y15, Y14, Y13 VPUNPCKHDQ Y15, Y14, Y14 VPUNPCKLQDQ Y9, Y0, Y15 VPUNPCKHQDQ Y9, Y0, Y0 VPUNPCKLQDQ Y10, Y8, Y9 VPUNPCKHQDQ Y10, Y8, Y8 VPUNPCKLQDQ Y13, Y11, Y10 VPUNPCKHQDQ Y13, Y11, Y11 VPUNPCKLQDQ Y14, Y12, Y13 VPUNPCKHQDQ Y14, Y12, Y12 VINSERTI128 $0x01, X10, Y15, Y14 VPERM2I128 $0x31, Y10, Y15, Y10 VINSERTI128 $0x01, X11, Y0, Y15 VPERM2I128 $0x31, Y11, Y0, Y0 VINSERTI128 $0x01, X13, Y9, Y11 VPERM2I128 $0x31, Y13, Y9, Y9 VINSERTI128 $0x01, X12, Y8, Y13 VPERM2I128 $0x31, Y12, Y8, Y8 VMOVDQU Y14, 176(SP) VMOVDQU Y15, 208(SP) VMOVDQU Y11, 240(SP) VMOVDQU Y13, 272(SP) VMOVDQU Y10, 304(SP) VMOVDQU Y0, 336(SP) VMOVDQU Y9, 368(SP) VMOVDQU Y8, 400(SP) VMOVDQU 32(AX)(CX*1), Y0 VMOVDQU 1056(AX)(CX*1), Y8 VMOVDQU 2080(AX)(CX*1), Y9 VMOVDQU 3104(AX)(CX*1), Y10 VMOVDQU 4128(AX)(CX*1), Y11 VMOVDQU 5152(AX)(CX*1), Y12 VMOVDQU 6176(AX)(CX*1), Y13 VMOVDQU 7200(AX)(CX*1), Y14 VPUNPCKLDQ Y8, Y0, Y15 VPUNPCKHDQ Y8, Y0, Y0 VPUNPCKLDQ Y10, Y9, Y8 VPUNPCKHDQ Y10, Y9, Y9 VPUNPCKLDQ Y12, Y11, Y10 VPUNPCKHDQ Y12, Y11, Y11 VPUNPCKLDQ Y14, Y13, Y12 VPUNPCKHDQ Y14, Y13, Y13 VPUNPCKLQDQ Y8, Y15, Y14 VPUNPCKHQDQ Y8, Y15, Y8 VPUNPCKLQDQ Y9, Y0, Y15 VPUNPCKHQDQ Y9, Y0, Y0 VPUNPCKLQDQ Y12, Y10, Y9 VPUNPCKHQDQ Y12, Y10, Y10 VPUNPCKLQDQ Y13, Y11, Y12 VPUNPCKHQDQ Y13, Y11, Y11 VINSERTI128 $0x01, X9, Y14, Y13 VPERM2I128 $0x31, Y9, Y14, Y9 VINSERTI128 $0x01, X10, Y8, Y14 VPERM2I128 $0x31, Y10, Y8, Y8 VINSERTI128 $0x01, X12, Y15, Y10 VPERM2I128 $0x31, Y12, Y15, Y12 VINSERTI128 $0x01, X11, Y0, Y15 VPERM2I128 $0x31, Y11, Y0, Y0 VMOVDQU Y13, 432(SP) VMOVDQU Y14, 464(SP) VMOVDQU Y10, 496(SP) VMOVDQU Y15, 528(SP) VMOVDQU Y9, 560(SP) VMOVDQU Y8, 592(SP) VMOVDQU Y12, 624(SP) VMOVDQU Y0, 656(SP) // Load constants for the round VMOVDQA (R11), Y0 VMOVDQU block_len<>+0(SB), Y8 VPBROADCASTD 64(SP), Y9 VPBROADCASTD iv<>+0(SB), Y10 VPBROADCASTD iv<>+4(SB), Y11 VPBROADCASTD iv<>+8(SB), Y12 VPBROADCASTD iv<>+12(SB), Y13 VMOVDQU 112(SP), Y14 VMOVDQU 144(SP), Y15 // Save state for partial chunk if necessary CMPQ CX, R10 JNE begin_rounds VMOVDQU Y0, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, (R8) VMOVDQU Y1, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 4(R8) VMOVDQU Y2, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 8(R8) VMOVDQU Y3, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 12(R8) VMOVDQU Y4, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 16(R8) VMOVDQU Y5, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 20(R8) VMOVDQU Y6, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 24(R8) VMOVDQU Y7, 80(SP) MOVL 80(SP)(R9*4), DX MOVL DX, 28(R8) begin_rounds: // Perform the rounds // Round 1 VPADDD 176(SP), Y0, Y0 VPADDD 240(SP), Y1, Y1 VPADDD 304(SP), Y2, Y2 VPADDD 368(SP), Y3, Y3 VPADDD Y4, Y0, Y0 VPXOR Y0, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y5, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y6, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y7, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y4, Y4 VPADDD Y15, Y11, Y11 VPXOR Y11, Y5, Y5 VPADDD Y8, Y12, Y12 VPXOR Y12, Y6, Y6 VPADDD Y9, Y13, Y13 VPXOR Y13, Y7, Y7 VMOVDQA Y0, (R11) VPSRLD $0x0c, Y4, Y0 VPSLLD $0x14, Y4, Y4 VPOR Y0, Y4, Y0 VPSRLD $0x0c, Y5, Y4 VPSLLD $0x14, Y5, Y5 VPOR Y4, Y5, Y4 VPSRLD $0x0c, Y6, Y5 VPSLLD $0x14, Y6, Y6 VPOR Y5, Y6, Y5 VPSRLD $0x0c, Y7, Y6 VPSLLD $0x14, Y7, Y7 VPOR Y6, Y7, Y6 VMOVDQA (R11), Y7 VPADDD 208(SP), Y7, Y7 VPADDD 272(SP), Y1, Y1 VPADDD 336(SP), Y2, Y2 VPADDD 400(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 432(SP), Y7, Y7 VPADDD 496(SP), Y1, Y1 VPADDD 560(SP), Y2, Y2 VPADDD 624(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 464(SP), Y7, Y7 VPADDD 528(SP), Y1, Y1 VPADDD 592(SP), Y2, Y2 VPADDD 656(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 2 VMOVDQA (R11), Y7 VPADDD 240(SP), Y7, Y7 VPADDD 272(SP), Y1, Y1 VPADDD 400(SP), Y2, Y2 VPADDD 304(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 368(SP), Y7, Y7 VPADDD 496(SP), Y1, Y1 VPADDD 176(SP), Y2, Y2 VPADDD 592(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 208(SP), Y7, Y7 VPADDD 560(SP), Y1, Y1 VPADDD 464(SP), Y2, Y2 VPADDD 656(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 528(SP), Y7, Y7 VPADDD 336(SP), Y1, Y1 VPADDD 624(SP), Y2, Y2 VPADDD 432(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 3 VMOVDQA (R11), Y7 VPADDD 272(SP), Y7, Y7 VPADDD 496(SP), Y1, Y1 VPADDD 592(SP), Y2, Y2 VPADDD 400(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 304(SP), Y7, Y7 VPADDD 560(SP), Y1, Y1 VPADDD 240(SP), Y2, Y2 VPADDD 624(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 368(SP), Y7, Y7 VPADDD 464(SP), Y1, Y1 VPADDD 528(SP), Y2, Y2 VPADDD 432(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 336(SP), Y7, Y7 VPADDD 176(SP), Y1, Y1 VPADDD 656(SP), Y2, Y2 VPADDD 208(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 4 VMOVDQA (R11), Y7 VPADDD 496(SP), Y7, Y7 VPADDD 560(SP), Y1, Y1 VPADDD 624(SP), Y2, Y2 VPADDD 592(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 400(SP), Y7, Y7 VPADDD 464(SP), Y1, Y1 VPADDD 272(SP), Y2, Y2 VPADDD 656(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 304(SP), Y7, Y7 VPADDD 528(SP), Y1, Y1 VPADDD 336(SP), Y2, Y2 VPADDD 208(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 176(SP), Y7, Y7 VPADDD 240(SP), Y1, Y1 VPADDD 432(SP), Y2, Y2 VPADDD 368(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 5 VMOVDQA (R11), Y7 VPADDD 560(SP), Y7, Y7 VPADDD 464(SP), Y1, Y1 VPADDD 656(SP), Y2, Y2 VPADDD 624(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 592(SP), Y7, Y7 VPADDD 528(SP), Y1, Y1 VPADDD 496(SP), Y2, Y2 VPADDD 432(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 400(SP), Y7, Y7 VPADDD 336(SP), Y1, Y1 VPADDD 176(SP), Y2, Y2 VPADDD 368(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 240(SP), Y7, Y7 VPADDD 272(SP), Y1, Y1 VPADDD 208(SP), Y2, Y2 VPADDD 304(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 6 VMOVDQA (R11), Y7 VPADDD 464(SP), Y7, Y7 VPADDD 528(SP), Y1, Y1 VPADDD 432(SP), Y2, Y2 VPADDD 656(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 624(SP), Y7, Y7 VPADDD 336(SP), Y1, Y1 VPADDD 560(SP), Y2, Y2 VPADDD 208(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 592(SP), Y7, Y7 VPADDD 176(SP), Y1, Y1 VPADDD 240(SP), Y2, Y2 VPADDD 304(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 272(SP), Y7, Y7 VPADDD 496(SP), Y1, Y1 VPADDD 368(SP), Y2, Y2 VPADDD 400(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Round 7 VMOVDQA (R11), Y7 VPADDD 528(SP), Y7, Y7 VPADDD 336(SP), Y1, Y1 VPADDD 208(SP), Y2, Y2 VPADDD 432(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 656(SP), Y7, Y7 VPADDD 176(SP), Y1, Y1 VPADDD 464(SP), Y2, Y2 VPADDD 368(SP), Y3, Y3 VPADDD Y0, Y7, Y7 VPXOR Y7, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y4, Y1, Y1 VPXOR Y1, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y5, Y2, Y2 VPXOR Y2, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y6, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y14, Y10, Y10 VPXOR Y10, Y0, Y0 VPADDD Y15, Y11, Y11 VPXOR Y11, Y4, Y4 VPADDD Y8, Y12, Y12 VPXOR Y12, Y5, Y5 VPADDD Y9, Y13, Y13 VPXOR Y13, Y6, Y6 VMOVDQA Y7, (R11) VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VMOVDQA (R11), Y7 VPADDD 624(SP), Y7, Y7 VPADDD 240(SP), Y1, Y1 VPADDD 272(SP), Y2, Y2 VPADDD 400(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot16_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot16_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot16_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x0c, Y4, Y7 VPSLLD $0x14, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x0c, Y5, Y7 VPSLLD $0x14, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x0c, Y6, Y7 VPSLLD $0x14, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x0c, Y0, Y7 VPSLLD $0x14, Y0, Y0 VPOR Y7, Y0, Y0 VMOVDQA (R11), Y7 VPADDD 496(SP), Y7, Y7 VPADDD 560(SP), Y1, Y1 VPADDD 304(SP), Y2, Y2 VPADDD 592(SP), Y3, Y3 VPADDD Y4, Y7, Y7 VPXOR Y7, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y5, Y1, Y1 VPXOR Y1, Y14, Y14 VPSHUFB rot8_shuf<>+0(SB), Y14, Y14 VPADDD Y6, Y2, Y2 VPXOR Y2, Y15, Y15 VPSHUFB rot8_shuf<>+0(SB), Y15, Y15 VPADDD Y0, Y3, Y3 VPXOR Y3, Y8, Y8 VPSHUFB rot8_shuf<>+0(SB), Y8, Y8 VPADDD Y9, Y12, Y12 VPXOR Y12, Y4, Y4 VPADDD Y14, Y13, Y13 VPXOR Y13, Y5, Y5 VPADDD Y15, Y10, Y10 VPXOR Y10, Y6, Y6 VPADDD Y8, Y11, Y11 VPXOR Y11, Y0, Y0 VMOVDQA Y7, (R11) VPSRLD $0x07, Y4, Y7 VPSLLD $0x19, Y4, Y4 VPOR Y7, Y4, Y4 VPSRLD $0x07, Y5, Y7 VPSLLD $0x19, Y5, Y5 VPOR Y7, Y5, Y5 VPSRLD $0x07, Y6, Y7 VPSLLD $0x19, Y6, Y6 VPOR Y7, Y6, Y6 VPSRLD $0x07, Y0, Y7 VPSLLD $0x19, Y0, Y0 VPOR Y7, Y0, Y0 // Finalize rounds VPXOR Y9, Y6, Y6 VPXOR (R11), Y10, Y7 VPXOR Y11, Y1, Y1 VPXOR Y12, Y2, Y2 VPXOR Y13, Y3, Y3 VPXOR Y14, Y0, Y0 VPXOR Y15, Y4, Y4 VPXOR Y8, Y5, Y5 // Fix up registers for next iteration VMOVDQU Y7, Y8 VMOVDQU Y6, Y7 VMOVDQU Y5, Y6 VMOVDQU Y4, Y5 VMOVDQU Y0, Y4 VMOVDQU Y8, Y0 // If we have zero complete chunks, we're done CMPQ R9, $0x00 JNE loop_trailer CMPQ R10, CX JEQ finalize loop_trailer: // Increment, reset flags, and loop CMPQ CX, $0x000003c0 JEQ finalize ADDQ $0x40, CX MOVL BX, 64(SP) JMP loop finalize: // Store result into out VMOVDQU Y0, (DI) VMOVDQU Y1, 32(DI) VMOVDQU Y2, 64(DI) VMOVDQU Y3, 96(DI) VMOVDQU Y4, 128(DI) VMOVDQU Y5, 160(DI) VMOVDQU Y6, 192(DI) VMOVDQU Y7, 224(DI) VZEROUPPER RET // func HashP(left *[32]uint32, right *[32]uint32, flags uint8, key *[8]uint32, out *[32]uint32, n int) // Requires: AVX, AVX2 TEXT ·HashP(SB), NOSPLIT, $72-48 MOVQ left+0(FP), AX MOVQ right+8(FP), CX MOVBLZX flags+16(FP), DX MOVQ key+24(FP), BX MOVQ out+32(FP), SI // Allocate local space and align it LEAQ 31(SP), DI MOVQ $0x000000000000001f, R8 NOTQ R8 ANDQ R8, DI // Set up flags value MOVL DX, 64(SP) // Perform the rounds // Round 1 VPBROADCASTD (BX), Y0 VPADDD (AX), Y0, Y0 VPBROADCASTD 4(BX), Y1 VPADDD 64(AX), Y1, Y1 VPBROADCASTD 8(BX), Y2 VPADDD 128(AX), Y2, Y2 VPBROADCASTD 12(BX), Y3 VPADDD 192(AX), Y3, Y3 VPBROADCASTD 16(BX), Y4 VPADDD Y4, Y0, Y0 VMOVDQU zero<>+0(SB), Y5 VPXOR Y0, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPBROADCASTD 20(BX), Y6 VPADDD Y6, Y1, Y1 VMOVDQU zero<>+0(SB), Y7 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPBROADCASTD 24(BX), Y8 VPADDD Y8, Y2, Y2 VMOVDQU block_len<>+0(SB), Y9 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPBROADCASTD 28(BX), Y10 VPADDD Y10, Y3, Y3 VPBROADCASTD 64(SP), Y11 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPBROADCASTD iv<>+0(SB), Y12 VPADDD Y5, Y12, Y12 VPXOR Y12, Y4, Y4 VPBROADCASTD iv<>+4(SB), Y13 VPADDD Y7, Y13, Y13 VPXOR Y13, Y6, Y6 VPBROADCASTD iv<>+8(SB), Y14 VPADDD Y9, Y14, Y14 VPXOR Y14, Y8, Y8 VPBROADCASTD iv<>+12(SB), Y15 VPADDD Y11, Y15, Y15 VPXOR Y15, Y10, Y10 VMOVDQA Y0, (DI) VPSRLD $0x0c, Y4, Y0 VPSLLD $0x14, Y4, Y4 VPOR Y0, Y4, Y0 VPSRLD $0x0c, Y6, Y4 VPSLLD $0x14, Y6, Y6 VPOR Y4, Y6, Y4 VPSRLD $0x0c, Y8, Y6 VPSLLD $0x14, Y8, Y8 VPOR Y6, Y8, Y6 VPSRLD $0x0c, Y10, Y8 VPSLLD $0x14, Y10, Y10 VPOR Y8, Y10, Y8 VMOVDQA (DI), Y10 VPADDD 32(AX), Y10, Y10 VPADDD 96(AX), Y1, Y1 VPADDD 160(AX), Y2, Y2 VPADDD 224(AX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD (CX), Y10, Y10 VPADDD 64(CX), Y1, Y1 VPADDD 128(CX), Y2, Y2 VPADDD 192(CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 32(CX), Y10, Y10 VPADDD 96(CX), Y1, Y1 VPADDD 160(CX), Y2, Y2 VPADDD 224(CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 2 VMOVDQA (DI), Y10 VPADDD 64(AX), Y10, Y10 VPADDD 96(AX), Y1, Y1 VPADDD 224(AX), Y2, Y2 VPADDD 128(AX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 192(AX), Y10, Y10 VPADDD 64(CX), Y1, Y1 VPADDD (AX), Y2, Y2 VPADDD 160(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 32(AX), Y10, Y10 VPADDD 128(CX), Y1, Y1 VPADDD 32(CX), Y2, Y2 VPADDD 224(CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 96(CX), Y10, Y10 VPADDD 160(AX), Y1, Y1 VPADDD 192(CX), Y2, Y2 VPADDD (CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 3 VMOVDQA (DI), Y10 VPADDD 96(AX), Y10, Y10 VPADDD 64(CX), Y1, Y1 VPADDD 160(CX), Y2, Y2 VPADDD 224(AX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 128(AX), Y10, Y10 VPADDD 128(CX), Y1, Y1 VPADDD 64(AX), Y2, Y2 VPADDD 192(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 192(AX), Y10, Y10 VPADDD 32(CX), Y1, Y1 VPADDD 96(CX), Y2, Y2 VPADDD (CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 160(AX), Y10, Y10 VPADDD (AX), Y1, Y1 VPADDD 224(CX), Y2, Y2 VPADDD 32(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 4 VMOVDQA (DI), Y10 VPADDD 64(CX), Y10, Y10 VPADDD 128(CX), Y1, Y1 VPADDD 192(CX), Y2, Y2 VPADDD 160(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 224(AX), Y10, Y10 VPADDD 32(CX), Y1, Y1 VPADDD 96(AX), Y2, Y2 VPADDD 224(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 128(AX), Y10, Y10 VPADDD 96(CX), Y1, Y1 VPADDD 160(AX), Y2, Y2 VPADDD 32(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD (AX), Y10, Y10 VPADDD 64(AX), Y1, Y1 VPADDD (CX), Y2, Y2 VPADDD 192(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 5 VMOVDQA (DI), Y10 VPADDD 128(CX), Y10, Y10 VPADDD 32(CX), Y1, Y1 VPADDD 224(CX), Y2, Y2 VPADDD 192(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 160(CX), Y10, Y10 VPADDD 96(CX), Y1, Y1 VPADDD 64(CX), Y2, Y2 VPADDD (CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 224(AX), Y10, Y10 VPADDD 160(AX), Y1, Y1 VPADDD (AX), Y2, Y2 VPADDD 192(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 64(AX), Y10, Y10 VPADDD 96(AX), Y1, Y1 VPADDD 32(AX), Y2, Y2 VPADDD 128(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 6 VMOVDQA (DI), Y10 VPADDD 32(CX), Y10, Y10 VPADDD 96(CX), Y1, Y1 VPADDD (CX), Y2, Y2 VPADDD 224(CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 192(CX), Y10, Y10 VPADDD 160(AX), Y1, Y1 VPADDD 128(CX), Y2, Y2 VPADDD 32(AX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 160(CX), Y10, Y10 VPADDD (AX), Y1, Y1 VPADDD 64(AX), Y2, Y2 VPADDD 128(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 96(AX), Y10, Y10 VPADDD 64(CX), Y1, Y1 VPADDD 192(AX), Y2, Y2 VPADDD 224(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Round 7 VMOVDQA (DI), Y10 VPADDD 96(CX), Y10, Y10 VPADDD 160(AX), Y1, Y1 VPADDD 32(AX), Y2, Y2 VPADDD (CX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 224(CX), Y10, Y10 VPADDD (AX), Y1, Y1 VPADDD 32(CX), Y2, Y2 VPADDD 192(AX), Y3, Y3 VPADDD Y0, Y10, Y10 VPXOR Y10, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y4, Y1, Y1 VPXOR Y1, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y6, Y2, Y2 VPXOR Y2, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y8, Y3, Y3 VPXOR Y3, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y5, Y12, Y12 VPXOR Y12, Y0, Y0 VPADDD Y7, Y13, Y13 VPXOR Y13, Y4, Y4 VPADDD Y9, Y14, Y14 VPXOR Y14, Y6, Y6 VPADDD Y11, Y15, Y15 VPXOR Y15, Y8, Y8 VMOVDQA Y10, (DI) VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VMOVDQA (DI), Y10 VPADDD 192(CX), Y10, Y10 VPADDD 64(AX), Y1, Y1 VPADDD 96(AX), Y2, Y2 VPADDD 224(AX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot16_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot16_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot16_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot16_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x0c, Y4, Y10 VPSLLD $0x14, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x0c, Y6, Y10 VPSLLD $0x14, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x0c, Y8, Y10 VPSLLD $0x14, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x0c, Y0, Y10 VPSLLD $0x14, Y0, Y0 VPOR Y10, Y0, Y0 VMOVDQA (DI), Y10 VPADDD 64(CX), Y10, Y10 VPADDD 128(CX), Y1, Y1 VPADDD 128(AX), Y2, Y2 VPADDD 160(CX), Y3, Y3 VPADDD Y4, Y10, Y10 VPXOR Y10, Y11, Y11 VPSHUFB rot8_shuf<>+0(SB), Y11, Y11 VPADDD Y6, Y1, Y1 VPXOR Y1, Y5, Y5 VPSHUFB rot8_shuf<>+0(SB), Y5, Y5 VPADDD Y8, Y2, Y2 VPXOR Y2, Y7, Y7 VPSHUFB rot8_shuf<>+0(SB), Y7, Y7 VPADDD Y0, Y3, Y3 VPXOR Y3, Y9, Y9 VPSHUFB rot8_shuf<>+0(SB), Y9, Y9 VPADDD Y11, Y14, Y14 VPXOR Y14, Y4, Y4 VPADDD Y5, Y15, Y15 VPXOR Y15, Y6, Y6 VPADDD Y7, Y12, Y12 VPXOR Y12, Y8, Y8 VPADDD Y9, Y13, Y13 VPXOR Y13, Y0, Y0 VMOVDQA Y10, (DI) VPSRLD $0x07, Y4, Y10 VPSLLD $0x19, Y4, Y4 VPOR Y10, Y4, Y4 VPSRLD $0x07, Y6, Y10 VPSLLD $0x19, Y6, Y6 VPOR Y10, Y6, Y6 VPSRLD $0x07, Y8, Y10 VPSLLD $0x19, Y8, Y8 VPOR Y10, Y8, Y8 VPSRLD $0x07, Y0, Y10 VPSLLD $0x19, Y0, Y0 VPOR Y10, Y0, Y0 // Finalize VPXOR (DI), Y12, Y10 VPXOR Y13, Y1, Y1 VPXOR Y14, Y2, Y2 VPXOR Y15, Y3, Y3 VPXOR Y5, Y0, Y0 VPXOR Y7, Y4, Y4 VPXOR Y9, Y6, Y5 VPXOR Y11, Y8, Y6 // Store result into out VMOVDQU Y10, (SI) VMOVDQU Y1, 32(SI) VMOVDQU Y2, 64(SI) VMOVDQU Y3, 96(SI) VMOVDQU Y0, 128(SI) VMOVDQU Y4, 160(SI) VMOVDQU Y5, 192(SI) VMOVDQU Y6, 224(SI) VZEROUPPER RET golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_avx2/impl_other.go000066400000000000000000000007021512402427200263610ustar00rootroot00000000000000//go:build !amd64 // +build !amd64 package hash_avx2 import "github.com/zeebo/blake3/internal/alg/hash/hash_pure" func HashF(input *[8192]byte, length, counter uint64, flags uint32, key *[8]uint32, out *[64]uint32, chain *[8]uint32) { hash_pure.HashF(input, length, counter, flags, key, out, chain) } func HashP(left, right *[64]uint32, flags uint32, key *[8]uint32, out *[64]uint32, n int) { hash_pure.HashP(left, right, flags, key, out, n) } golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_avx2/impl_test.go000066400000000000000000000026311512402427200262220ustar00rootroot00000000000000package hash_avx2_test import ( "testing" "github.com/zeebo/assert" "github.com/zeebo/blake3/internal/alg/hash/hash_avx2" "github.com/zeebo/blake3/internal/alg/hash/hash_pure" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/pcg" ) func TestHashF(t *testing.T) { if !consts.HasAVX2 { t.SkipNow() } var input [8192]byte var key [8]uint32 for n := 0; n <= 8192; n++ { var c1, c2 [8]uint32 var o1, o2 [64]uint32 ctr, flags := pcg.Uint64(), pcg.Uint32() for i := range &key { key[i] = pcg.Uint32() } for i := 0; i < n; i++ { input[i] = byte(i+1) % 251 } hash_avx2.HashF(&input, uint64(n), ctr, flags, &key, &o1, &c1) hash_pure.HashF(&input, uint64(n), ctr, flags, &key, &o2, &c2) for i := 0; (i+1)*1024 <= n; i++ { for j := 0; j < 8; j++ { assert.Equal(t, o1[i+8*j], o2[i+8*j]) } } if n%1024 != 0 { assert.Equal(t, c1, c2) } } } func TestHashP(t *testing.T) { if !consts.HasAVX2 { t.SkipNow() } var key [8]uint32 var left, right [64]uint32 for i := 0; i < 64; i++ { left[i] = uint32(i+1) % 251 right[i] = uint32(i+2) % 251 } for n := 1; n <= 8; n++ { var o1, o2 [64]uint32 for i := range &key { key[i] = pcg.Uint32() } hash_avx2.HashP(&left, &right, 0, &key, &o1, n) hash_pure.HashP(&left, &right, 0, &key, &o2, n) for i := 0; i < n; i++ { for j := 0; j < 8; j++ { assert.Equal(t, o1[i+8*j], o2[i+8*j]) } } } } golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_avx2/stubs.go000066400000000000000000000004431512402427200253610ustar00rootroot00000000000000//go:build amd64 // +build amd64 package hash_avx2 //go:noescape func HashF(input *[8192]byte, length, counter uint64, flags uint32, key *[8]uint32, out *[64]uint32, chain *[8]uint32) //go:noescape func HashP(left, right *[64]uint32, flags uint32, key *[8]uint32, out *[64]uint32, n int) golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_pure/000077500000000000000000000000001512402427200237645ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_pure/hashf.go000066400000000000000000000025141512402427200254060ustar00rootroot00000000000000package hash_pure import ( "unsafe" "github.com/zeebo/blake3/internal/alg/compress" "github.com/zeebo/blake3/internal/consts" "github.com/zeebo/blake3/internal/utils" ) func HashF(input *[8192]byte, length, counter uint64, flags uint32, key *[8]uint32, out *[64]uint32, chain *[8]uint32) { var tmp [16]uint32 for i := uint64(0); consts.ChunkLen*i < length && i < 8; i++ { bchain := *key bflags := flags | consts.Flag_ChunkStart start := consts.ChunkLen * i for n := uint64(0); n < 16; n++ { if n == 15 { bflags |= consts.Flag_ChunkEnd } if start+64*n >= length { break } if start+64+64*n >= length { *chain = bchain } var blockPtr *[16]uint32 if consts.OptimizeLittleEndian { blockPtr = (*[16]uint32)(unsafe.Pointer(&input[consts.ChunkLen*i+consts.BlockLen*n])) } else { var block [16]uint32 utils.BytesToWords((*[64]uint8)(unsafe.Pointer(&input[consts.ChunkLen*i+consts.BlockLen*n])), &block) blockPtr = &block } compress.Compress(&bchain, blockPtr, counter, consts.BlockLen, bflags, &tmp) bchain = *(*[8]uint32)(unsafe.Pointer(&tmp[0])) bflags = flags } out[i+0] = bchain[0] out[i+8] = bchain[1] out[i+16] = bchain[2] out[i+24] = bchain[3] out[i+32] = bchain[4] out[i+40] = bchain[5] out[i+48] = bchain[6] out[i+56] = bchain[7] counter++ } } golang-github-zeebo-blake3-0.2.4/internal/alg/hash/hash_pure/hashp.go000066400000000000000000000015411512402427200254170ustar00rootroot00000000000000package hash_pure import "github.com/zeebo/blake3/internal/alg/compress" func HashP(left, right *[64]uint32, flags uint32, key *[8]uint32, out *[64]uint32, n int) { var tmp [16]uint32 var block [16]uint32 for i := 0; i < n && i < 8; i++ { block[0] = left[i+0] block[1] = left[i+8] block[2] = left[i+16] block[3] = left[i+24] block[4] = left[i+32] block[5] = left[i+40] block[6] = left[i+48] block[7] = left[i+56] block[8] = right[i+0] block[9] = right[i+8] block[10] = right[i+16] block[11] = right[i+24] block[12] = right[i+32] block[13] = right[i+40] block[14] = right[i+48] block[15] = right[i+56] compress.Compress(key, &block, 0, 64, flags, &tmp) out[i+0] = tmp[0] out[i+8] = tmp[1] out[i+16] = tmp[2] out[i+24] = tmp[3] out[i+32] = tmp[4] out[i+40] = tmp[5] out[i+48] = tmp[6] out[i+56] = tmp[7] } } golang-github-zeebo-blake3-0.2.4/internal/consts/000077500000000000000000000000001512402427200216315ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/consts/consts.go000066400000000000000000000010661512402427200234740ustar00rootroot00000000000000package consts var IV = [...]uint32{IV0, IV1, IV2, IV3, IV4, IV5, IV6, IV7} const ( IV0 = 0x6A09E667 IV1 = 0xBB67AE85 IV2 = 0x3C6EF372 IV3 = 0xA54FF53A IV4 = 0x510E527F IV5 = 0x9B05688C IV6 = 0x1F83D9AB IV7 = 0x5BE0CD19 ) const ( Flag_ChunkStart uint32 = 1 << 0 Flag_ChunkEnd uint32 = 1 << 1 Flag_Parent uint32 = 1 << 2 Flag_Root uint32 = 1 << 3 Flag_Keyed uint32 = 1 << 4 Flag_DeriveKeyContext uint32 = 1 << 5 Flag_DeriveKeyMaterial uint32 = 1 << 6 ) const ( BlockLen = 64 ChunkLen = 1024 ) golang-github-zeebo-blake3-0.2.4/internal/consts/cpu.go000066400000000000000000000005211512402427200227450ustar00rootroot00000000000000//go:build !purego package consts import ( "os" "github.com/klauspost/cpuid/v2" ) var ( HasAVX2 = cpuid.CPU.Has(cpuid.AVX2) && os.Getenv("BLAKE3_DISABLE_AVX2") == "" && os.Getenv("BLAKE3_PUREGO") == "" HasSSE41 = cpuid.CPU.Has(cpuid.SSE4) && os.Getenv("BLAKE3_DISABLE_SSE41") == "" && os.Getenv("BLAKE3_PUREGO") == "" ) golang-github-zeebo-blake3-0.2.4/internal/consts/cpu_little.go000066400000000000000000000003221512402427200243210ustar00rootroot00000000000000//go:build amd64 || 386 || arm || arm64 || mipsle || mips64le || ppc64le || riscv64 || wasm // +build amd64 386 arm arm64 mipsle mips64le ppc64le riscv64 wasm package consts const OptimizeLittleEndian = true golang-github-zeebo-blake3-0.2.4/internal/consts/cpu_other.go000066400000000000000000000003451512402427200241520ustar00rootroot00000000000000//go:build !amd64 && !386 && !arm && !arm64 && !mipsle && !mips64le && !ppc64le && !riscv64 && !wasm // +build !amd64,!386,!arm,!arm64,!mipsle,!mips64le,!ppc64le,!riscv64,!wasm package consts const OptimizeLittleEndian = false golang-github-zeebo-blake3-0.2.4/internal/consts/cpu_purego.go000066400000000000000000000001211512402427200243220ustar00rootroot00000000000000//go:build purego package consts const ( HasAVX2 = false HasSSE41 = false ) golang-github-zeebo-blake3-0.2.4/internal/utils/000077500000000000000000000000001512402427200214605ustar00rootroot00000000000000golang-github-zeebo-blake3-0.2.4/internal/utils/utils.go000066400000000000000000000050471512402427200231550ustar00rootroot00000000000000package utils import ( "encoding/binary" "unsafe" ) func SliceToArray32(bytes []byte) *[32]uint8 { return (*[32]uint8)(unsafe.Pointer(&bytes[0])) } func SliceToArray64(bytes []byte) *[64]uint8 { return (*[64]uint8)(unsafe.Pointer(&bytes[0])) } func BytesToWords(bytes *[64]uint8, words *[16]uint32) { words[0] = binary.LittleEndian.Uint32(bytes[0*4:]) words[1] = binary.LittleEndian.Uint32(bytes[1*4:]) words[2] = binary.LittleEndian.Uint32(bytes[2*4:]) words[3] = binary.LittleEndian.Uint32(bytes[3*4:]) words[4] = binary.LittleEndian.Uint32(bytes[4*4:]) words[5] = binary.LittleEndian.Uint32(bytes[5*4:]) words[6] = binary.LittleEndian.Uint32(bytes[6*4:]) words[7] = binary.LittleEndian.Uint32(bytes[7*4:]) words[8] = binary.LittleEndian.Uint32(bytes[8*4:]) words[9] = binary.LittleEndian.Uint32(bytes[9*4:]) words[10] = binary.LittleEndian.Uint32(bytes[10*4:]) words[11] = binary.LittleEndian.Uint32(bytes[11*4:]) words[12] = binary.LittleEndian.Uint32(bytes[12*4:]) words[13] = binary.LittleEndian.Uint32(bytes[13*4:]) words[14] = binary.LittleEndian.Uint32(bytes[14*4:]) words[15] = binary.LittleEndian.Uint32(bytes[15*4:]) } func WordsToBytes(words *[16]uint32, bytes []byte) { bytes = bytes[:64] binary.LittleEndian.PutUint32(bytes[0*4:1*4], words[0]) binary.LittleEndian.PutUint32(bytes[1*4:2*4], words[1]) binary.LittleEndian.PutUint32(bytes[2*4:3*4], words[2]) binary.LittleEndian.PutUint32(bytes[3*4:4*4], words[3]) binary.LittleEndian.PutUint32(bytes[4*4:5*4], words[4]) binary.LittleEndian.PutUint32(bytes[5*4:6*4], words[5]) binary.LittleEndian.PutUint32(bytes[6*4:7*4], words[6]) binary.LittleEndian.PutUint32(bytes[7*4:8*4], words[7]) binary.LittleEndian.PutUint32(bytes[8*4:9*4], words[8]) binary.LittleEndian.PutUint32(bytes[9*4:10*4], words[9]) binary.LittleEndian.PutUint32(bytes[10*4:11*4], words[10]) binary.LittleEndian.PutUint32(bytes[11*4:12*4], words[11]) binary.LittleEndian.PutUint32(bytes[12*4:13*4], words[12]) binary.LittleEndian.PutUint32(bytes[13*4:14*4], words[13]) binary.LittleEndian.PutUint32(bytes[14*4:15*4], words[14]) binary.LittleEndian.PutUint32(bytes[15*4:16*4], words[15]) } func KeyFromBytes(key []byte, out *[8]uint32) { key = key[:32] out[0] = binary.LittleEndian.Uint32(key[0:]) out[1] = binary.LittleEndian.Uint32(key[4:]) out[2] = binary.LittleEndian.Uint32(key[8:]) out[3] = binary.LittleEndian.Uint32(key[12:]) out[4] = binary.LittleEndian.Uint32(key[16:]) out[5] = binary.LittleEndian.Uint32(key[20:]) out[6] = binary.LittleEndian.Uint32(key[24:]) out[7] = binary.LittleEndian.Uint32(key[28:]) } golang-github-zeebo-blake3-0.2.4/internal/utils/utils_test.go000066400000000000000000000006221512402427200242060ustar00rootroot00000000000000package utils import ( "testing" "unsafe" "github.com/zeebo/assert" "github.com/zeebo/blake3/internal/consts" ) func TestBytesToWords(t *testing.T) { if !consts.OptimizeLittleEndian { t.SkipNow() } var bytes [64]uint8 for i := range bytes { bytes[i] = byte(i) } var words [16]uint32 BytesToWords(&bytes, &words) assert.Equal(t, *(*[16]uint32)(unsafe.Pointer(&bytes[0])), words) } golang-github-zeebo-blake3-0.2.4/vec_test.go000066400000000000000000000461551512402427200206620ustar00rootroot00000000000000package blake3 type testVec struct { inputLen int hash string keyedHash string deriveKey string } func (tv *testVec) input() []byte { out := make([]byte, tv.inputLen) for i := range out { out[i] = uint8(i % 251) } return out } const ( testVectorKey = "whats the Elvish word for friend" testVectorContext = "BLAKE3 2019-12-27 16:29:52 test vectors context" ) var vectors = []testVec{ { inputLen: 0, hash: "af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262e00f03e7b69af26b7faaf09fcd333050338ddfe085b8cc869ca98b206c08243a26f5487789e8f660afe6c99ef9e0c52b92e7393024a80459cf91f476f9ffdbda7001c22e159b402631f277ca96f2defdf1078282314e763699a31c5363165421cce14d", keyedHash: "92b2b75604ed3c761f9d6f62392c8a9227ad0ea3f09573e783f1498a4ed60d26b18171a2f22a4b94822c701f107153dba24918c4bae4d2945c20ece13387627d3b73cbf97b797d5e59948c7ef788f54372df45e45e4293c7dc18c1d41144a9758be58960856be1eabbe22c2653190de560ca3b2ac4aa692a9210694254c371e851bc8f", deriveKey: "2cc39783c223154fea8dfb7c1b1660f2ac2dcbd1c1de8277b0b0dd39b7e50d7d905630c8be290dfcf3e6842f13bddd573c098c3f17361f1f206b8cad9d088aa4a3f746752c6b0ce6a83b0da81d59649257cdf8eb3e9f7d4998e41021fac119deefb896224ac99f860011f73609e6e0e4540f93b273e56547dfd3aa1a035ba6689d89a0", }, { inputLen: 1, hash: "2d3adedff11b61f14c886e35afa036736dcd87a74d27b5c1510225d0f592e213c3a6cb8bf623e20cdb535f8d1a5ffb86342d9c0b64aca3bce1d31f60adfa137b358ad4d79f97b47c3d5e79f179df87a3b9776ef8325f8329886ba42f07fb138bb502f4081cbcec3195c5871e6c23e2cc97d3c69a613eba131e5f1351f3f1da786545e5", keyedHash: "6d7878dfff2f485635d39013278ae14f1454b8c0a3a2d34bc1ab38228a80c95b6568c0490609413006fbd428eb3fd14e7756d90f73a4725fad147f7bf70fd61c4e0cf7074885e92b0e3f125978b4154986d4fb202a3f331a3fb6cf349a3a70e49990f98fe4289761c8602c4e6ab1138d31d3b62218078b2f3ba9a88e1d08d0dd4cea11", deriveKey: "b3e2e340a117a499c6cf2398a19ee0d29cca2bb7404c73063382693bf66cb06c5827b91bf889b6b97c5477f535361caefca0b5d8c4746441c57617111933158950670f9aa8a05d791daae10ac683cbef8faf897c84e6114a59d2173c3f417023a35d6983f2c7dfa57e7fc559ad751dbfb9ffab39c2ef8c4aafebc9ae973a64f0c76551", }, { inputLen: 1023, hash: "10108970eeda3eb932baac1428c7a2163b0e924c9a9e25b35bba72b28f70bd11a182d27a591b05592b15607500e1e8dd56bc6c7fc063715b7a1d737df5bad3339c56778957d870eb9717b57ea3d9fb68d1b55127bba6a906a4a24bbd5acb2d123a37b28f9e9a81bbaae360d58f85e5fc9d75f7c370a0cc09b6522d9c8d822f2f28f485", keyedHash: "c951ecdf03288d0fcc96ee3413563d8a6d3589547f2c2fb36d9786470f1b9d6e890316d2e6d8b8c25b0a5b2180f94fb1a158ef508c3cde45e2966bd796a696d3e13efd86259d756387d9becf5c8bf1ce2192b87025152907b6d8cc33d17826d8b7b9bc97e38c3c85108ef09f013e01c229c20a83d9e8efac5b37470da28575fd755a10", deriveKey: "74a16c1c3d44368a86e1ca6df64be6a2f64cce8f09220787450722d85725dea59c413264404661e9e4d955409dfe4ad3aa487871bcd454ed12abfe2c2b1eb7757588cf6cb18d2eccad49e018c0d0fec323bec82bf1644c6325717d13ea712e6840d3e6e730d35553f59eff5377a9c350bcc1556694b924b858f329c44ee64b884ef00d", }, { inputLen: 1024, hash: "42214739f095a406f3fc83deb889744ac00df831c10daa55189b5d121c855af71cf8107265ecdaf8505b95d8fcec83a98a6a96ea5109d2c179c47a387ffbb404756f6eeae7883b446b70ebb144527c2075ab8ab204c0086bb22b7c93d465efc57f8d917f0b385c6df265e77003b85102967486ed57db5c5ca170ba441427ed9afa684e", keyedHash: "75c46f6f3d9eb4f55ecaaee480db732e6c2105546f1e675003687c31719c7ba4a78bc838c72852d4f49c864acb7adafe2478e824afe51c8919d06168414c265f298a8094b1ad813a9b8614acabac321f24ce61c5a5346eb519520d38ecc43e89b5000236df0597243e4d2493fd626730e2ba17ac4d8824d09d1a4a8f57b8227778e2de", deriveKey: "7356cd7720d5b66b6d0697eb3177d9f8d73a4a5c5e968896eb6a6896843027066c23b601d3ddfb391e90d5c8eccdef4ae2a264bce9e612ba15e2bc9d654af1481b2e75dbabe615974f1070bba84d56853265a34330b4766f8e75edd1f4a1650476c10802f22b64bd3919d246ba20a17558bc51c199efdec67e80a227251808d8ce5bad", }, { inputLen: 1025, hash: "d00278ae47eb27b34faecf67b4fe263f82d5412916c1ffd97c8cb7fb814b8444f4c4a22b4b399155358a994e52bf255de60035742ec71bd08ac275a1b51cc6bfe332b0ef84b409108cda080e6269ed4b3e2c3f7d722aa4cdc98d16deb554e5627be8f955c98e1d5f9565a9194cad0c4285f93700062d9595adb992ae68ff12800ab67a", keyedHash: "357dc55de0c7e382c900fd6e320acc04146be01db6a8ce7210b7189bd664ea69362396b77fdc0d2634a552970843722066c3c15902ae5097e00ff53f1e116f1cd5352720113a837ab2452cafbde4d54085d9cf5d21ca613071551b25d52e69d6c81123872b6f19cd3bc1333edf0c52b94de23ba772cf82636cff4542540a7738d5b930", deriveKey: "effaa245f065fbf82ac186839a249707c3bddf6d3fdda22d1b95a3c970379bcb5d31013a167509e9066273ab6e2123bc835b408b067d88f96addb550d96b6852dad38e320b9d940f86db74d398c770f462118b35d2724efa13da97194491d96dd37c3c09cbef665953f2ee85ec83d88b88d11547a6f911c8217cca46defa2751e7f3ad", }, { inputLen: 2048, hash: "e776b6028c7cd22a4d0ba182a8bf62205d2ef576467e838ed6f2529b85fba24a9a60bf80001410ec9eea6698cd537939fad4749edd484cb541aced55cd9bf54764d063f23f6f1e32e12958ba5cfeb1bf618ad094266d4fc3c968c2088f677454c288c67ba0dba337b9d91c7e1ba586dc9a5bc2d5e90c14f53a8863ac75655461cea8f9", keyedHash: "879cf1fa2ea0e79126cb1063617a05b6ad9d0b696d0d757cf053439f60a99dd10173b961cd574288194b23ece278c330fbb8585485e74967f31352a8183aa782b2b22f26cdcadb61eed1a5bc144b8198fbb0c13abbf8e3192c145d0a5c21633b0ef86054f42809df823389ee40811a5910dcbd1018af31c3b43aa55201ed4edaac74fe", deriveKey: "7b2945cb4fef70885cc5d78a87bf6f6207dd901ff239201351ffac04e1088a23e2c11a1ebffcea4d80447867b61badb1383d842d4e79645d48dd82ccba290769caa7af8eaa1bd78a2a5e6e94fbdab78d9c7b74e894879f6a515257ccf6f95056f4e25390f24f6b35ffbb74b766202569b1d797f2d4bd9d17524c720107f985f4ddc583", }, { inputLen: 2049, hash: "5f4d72f40d7a5f82b15ca2b2e44b1de3c2ef86c426c95c1af0b687952256303096de31d71d74103403822a2e0bc1eb193e7aecc9643a76b7bbc0c9f9c52e8783aae98764ca468962b5c2ec92f0c74eb5448d519713e09413719431c802f948dd5d90425a4ecdadece9eb178d80f26efccae630734dff63340285adec2aed3b51073ad3", keyedHash: "9f29700902f7c86e514ddc4df1e3049f258b2472b6dd5267f61bf13983b78dd5f9a88abfefdfa1e00b418971f2b39c64ca621e8eb37fceac57fd0c8fc8e117d43b81447be22d5d8186f8f5919ba6bcc6846bd7d50726c06d245672c2ad4f61702c646499ee1173daa061ffe15bf45a631e2946d616a4c345822f1151284712f76b2b0e", deriveKey: "2ea477c5515cc3dd606512ee72bb3e0e758cfae7232826f35fb98ca1bcbdf27316d8e9e79081a80b046b60f6a263616f33ca464bd78d79fa18200d06c7fc9bffd808cc4755277a7d5e09da0f29ed150f6537ea9bed946227ff184cc66a72a5f8c1e4bd8b04e81cf40fe6dc4427ad5678311a61f4ffc39d195589bdbc670f63ae70f4b6", }, { inputLen: 3072, hash: "b98cb0ff3623be03326b373de6b9095218513e64f1ee2edd2525c7ad1e5cffd29a3f6b0b978d6608335c09dc94ccf682f9951cdfc501bfe47b9c9189a6fc7b404d120258506341a6d802857322fbd20d3e5dae05b95c88793fa83db1cb08e7d8008d1599b6209d78336e24839724c191b2a52a80448306e0daa84a3fdb566661a37e11", keyedHash: "044a0e7b172a312dc02a4c9a818c036ffa2776368d7f528268d2e6b5df19177022f302d0529e4174cc507c463671217975e81dab02b8fdeb0d7ccc7568dd22574c783a76be215441b32e91b9a904be8ea81f7a0afd14bad8ee7c8efc305ace5d3dd61b996febe8da4f56ca0919359a7533216e2999fc87ff7d8f176fbecb3d6f34278b", deriveKey: "050df97f8c2ead654d9bb3ab8c9178edcd902a32f8495949feadcc1e0480c46b3604131bbd6e3ba573b6dd682fa0a63e5b165d39fc43a625d00207607a2bfeb65ff1d29292152e26b298868e3b87be95d6458f6f2ce6118437b632415abe6ad522874bcd79e4030a5e7bad2efa90a7a7c67e93f0a18fb28369d0a9329ab5c24134ccb0", }, { inputLen: 3073, hash: "7124b49501012f81cc7f11ca069ec9226cecb8a2c850cfe644e327d22d3e1cd39a27ae3b79d68d89da9bf25bc27139ae65a324918a5f9b7828181e52cf373c84f35b639b7fccbb985b6f2fa56aea0c18f531203497b8bbd3a07ceb5926f1cab74d14bd66486d9a91eba99059a98bd1cd25876b2af5a76c3e9eed554ed72ea952b603bf", keyedHash: "68dede9bef00ba89e43f31a6825f4cf433389fedae75c04ee9f0cf16a427c95a96d6da3fe985054d3478865be9a092250839a697bbda74e279e8a9e69f0025e4cfddd6cfb434b1cd9543aaf97c635d1b451a4386041e4bb100f5e45407cbbc24fa53ea2de3536ccb329e4eb9466ec37093a42cf62b82903c696a93a50b702c80f3c3c5", deriveKey: "72613c9ec9ff7e40f8f5c173784c532ad852e827dba2bf85b2ab4b76f7079081576288e552647a9d86481c2cae75c2dd4e7c5195fb9ada1ef50e9c5098c249d743929191441301c69e1f48505a4305ec1778450ee48b8e69dc23a25960fe33070ea549119599760a8a2d28aeca06b8c5e9ba58bc19e11fe57b6ee98aa44b2a8e6b14a5", }, { inputLen: 4096, hash: "015094013f57a5277b59d8475c0501042c0b642e531b0a1c8f58d2163229e9690289e9409ddb1b99768eafe1623da896faf7e1114bebeadc1be30829b6f8af707d85c298f4f0ff4d9438aef948335612ae921e76d411c3a9111df62d27eaf871959ae0062b5492a0feb98ef3ed4af277f5395172dbe5c311918ea0074ce0036454f620", keyedHash: "befc660aea2f1718884cd8deb9902811d332f4fc4a38cf7c7300d597a081bfc0bbb64a36edb564e01e4b4aaf3b060092a6b838bea44afebd2deb8298fa562b7b597c757b9df4c911c3ca462e2ac89e9a787357aaf74c3b56d5c07bc93ce899568a3eb17d9250c20f6c5f6c1e792ec9a2dcb715398d5a6ec6d5c54f586a00403a1af1de", deriveKey: "1e0d7f3db8c414c97c6307cbda6cd27ac3b030949da8e23be1a1a924ad2f25b9d78038f7b198596c6cc4a9ccf93223c08722d684f240ff6569075ed81591fd93f9fff1110b3a75bc67e426012e5588959cc5a4c192173a03c00731cf84544f65a2fb9378989f72e9694a6a394a8a30997c2e67f95a504e631cd2c5f55246024761b245", }, { inputLen: 4097, hash: "9b4052b38f1c5fc8b1f9ff7ac7b27cd242487b3d890d15c96a1c25b8aa0fb99505f91b0b5600a11251652eacfa9497b31cd3c409ce2e45cfe6c0a016967316c426bd26f619eab5d70af9a418b845c608840390f361630bd497b1ab44019316357c61dbe091ce72fc16dc340ac3d6e009e050b3adac4b5b2c92e722cffdc46501531956", keyedHash: "00df940cd36bb9fa7cbbc3556744e0dbc8191401afe70520ba292ee3ca80abbc606db4976cfdd266ae0abf667d9481831ff12e0caa268e7d3e57260c0824115a54ce595ccc897786d9dcbf495599cfd90157186a46ec800a6763f1c59e36197e9939e900809f7077c102f888caaf864b253bc41eea812656d46742e4ea42769f89b83f", deriveKey: "aca51029626b55fda7117b42a7c211f8c6e9ba4fe5b7a8ca922f34299500ead8a897f66a400fed9198fd61dd2d58d382458e64e100128075fc54b860934e8de2e84170734b06e1d212a117100820dbc48292d148afa50567b8b84b1ec336ae10d40c8c975a624996e12de31abbe135d9d159375739c333798a80c64ae895e51e22f3ad", }, { inputLen: 5120, hash: "9cadc15fed8b5d854562b26a9536d9707cadeda9b143978f319ab34230535833acc61c8fdc114a2010ce8038c853e121e1544985133fccdd0a2d507e8e615e611e9a0ba4f47915f49e53d721816a9198e8b30f12d20ec3689989175f1bf7a300eee0d9321fad8da232ece6efb8e9fd81b42ad161f6b9550a069e66b11b40487a5f5059", keyedHash: "2c493e48e9b9bf31e0553a22b23503c0a3388f035cece68eb438d22fa1943e209b4dc9209cd80ce7c1f7c9a744658e7e288465717ae6e56d5463d4f80cdb2ef56495f6a4f5487f69749af0c34c2cdfa857f3056bf8d807336a14d7b89bf62bef2fb54f9af6a546f818dc1e98b9e07f8a5834da50fa28fb5874af91bf06020d1bf0120e", deriveKey: "7a7acac8a02adcf3038d74cdd1d34527de8a0fcc0ee3399d1262397ce5817f6055d0cefd84d9d57fe792d65a278fd20384ac6c30fdb340092f1a74a92ace99c482b28f0fc0ef3b923e56ade20c6dba47e49227166251337d80a037e987ad3a7f728b5ab6dfafd6e2ab1bd583a95d9c895ba9c2422c24ea0f62961f0dca45cad47bfa0d", }, { inputLen: 5121, hash: "628bd2cb2004694adaab7bbd778a25df25c47b9d4155a55f8fbd79f2fe154cff96adaab0613a6146cdaabe498c3a94e529d3fc1da2bd08edf54ed64d40dcd6777647eac51d8277d70219a9694334a68bc8f0f23e20b0ff70ada6f844542dfa32cd4204ca1846ef76d811cdb296f65e260227f477aa7aa008bac878f72257484f2b6c95", keyedHash: "6ccf1c34753e7a044db80798ecd0782a8f76f33563accaddbfbb2e0ea4b2d0240d07e63f13667a8d1490e5e04f13eb617aea16a8c8a5aaed1ef6fbde1b0515e3c81050b361af6ead126032998290b563e3caddeaebfab592e155f2e161fb7cba939092133f23f9e65245e58ec23457b78a2e8a125588aad6e07d7f11a85b88d375b72d", deriveKey: "b07f01e518e702f7ccb44a267e9e112d403a7b3f4883a47ffbed4b48339b3c341a0add0ac032ab5aaea1e4e5b004707ec5681ae0fcbe3796974c0b1cf31a194740c14519273eedaabec832e8a784b6e7cfc2c5952677e6c3f2c3914454082d7eb1ce1766ac7d75a4d3001fc89544dd46b5147382240d689bbbaefc359fb6ae30263165", }, { inputLen: 6144, hash: "3e2e5b74e048f3add6d21faab3f83aa44d3b2278afb83b80b3c35164ebeca2054d742022da6fdda444ebc384b04a54c3ac5839b49da7d39f6d8a9db03deab32aade156c1c0311e9b3435cde0ddba0dce7b26a376cad121294b689193508dd63151603c6ddb866ad16c2ee41585d1633a2cea093bea714f4c5d6b903522045b20395c83", keyedHash: "3d6b6d21281d0ade5b2b016ae4034c5dec10ca7e475f90f76eac7138e9bc8f1dc35754060091dc5caf3efabe0603c60f45e415bb3407db67e6beb3d11cf8e4f7907561f05dace0c15807f4b5f389c841eb114d81a82c02a00b57206b1d11fa6e803486b048a5ce87105a686dee041207e095323dfe172df73deb8c9532066d88f9da7e", deriveKey: "2a95beae63ddce523762355cf4b9c1d8f131465780a391286a5d01abb5683a1597099e3c6488aab6c48f3c15dbe1942d21dbcdc12115d19a8b8465fb54e9053323a9178e4275647f1a9927f6439e52b7031a0b465c861a3fc531527f7758b2b888cf2f20582e9e2c593709c0a44f9c6e0f8b963994882ea4168827823eef1f64169fef", }, { inputLen: 6145, hash: "f1323a8631446cc50536a9f705ee5cb619424d46887f3c376c695b70e0f0507f18a2cfdd73c6e39dd75ce7c1c6e3ef238fd54465f053b25d21044ccb2093beb015015532b108313b5829c3621ce324b8e14229091b7c93f32db2e4e63126a377d2a63a3597997d4f1cba59309cb4af240ba70cebff9a23d5e3ff0cdae2cfd54e070022", keyedHash: "9ac301e9e39e45e3250a7e3b3df701aa0fb6889fbd80eeecf28dbc6300fbc539f3c184ca2f59780e27a576c1d1fb9772e99fd17881d02ac7dfd39675aca918453283ed8c3169085ef4a466b91c1649cc341dfdee60e32231fc34c9c4e0b9a2ba87ca8f372589c744c15fd6f985eec15e98136f25beeb4b13c4e43dc84abcc79cd4646c", deriveKey: "379bcc61d0051dd489f686c13de00d5b14c505245103dc040d9e4dd1facab8e5114493d029bdbd295aaa744a59e31f35c7f52dba9c3642f773dd0b4262a9980a2aef811697e1305d37ba9d8b6d850ef07fe41108993180cf779aeece363704c76483458603bbeeb693cffbbe5588d1f3535dcad888893e53d977424bb707201569a8d2", }, { inputLen: 7168, hash: "61da957ec2499a95d6b8023e2b0e604ec7f6b50e80a9678b89d2628e99ada77a5707c321c83361793b9af62a40f43b523df1c8633cecb4cd14d00bdc79c78fca5165b863893f6d38b02ff7236c5a9a8ad2dba87d24c547cab046c29fc5bc1ed142e1de4763613bb162a5a538e6ef05ed05199d751f9eb58d332791b8d73fb74e4fce95", keyedHash: "b42835e40e9d4a7f42ad8cc04f85a963a76e18198377ed84adddeaecacc6f3fca2f01d5277d69bb681c70fa8d36094f73ec06e452c80d2ff2257ed82e7ba348400989a65ee8daa7094ae0933e3d2210ac6395c4af24f91c2b590ef87d7788d7066ea3eaebca4c08a4f14b9a27644f99084c3543711b64a070b94f2c9d1d8a90d035d52", deriveKey: "11c37a112765370c94a51415d0d651190c288566e295d505defdad895dae223730d5a5175a38841693020669c7638f40b9bc1f9f39cf98bda7a5b54ae24218a800a2116b34665aa95d846d97ea988bfcb53dd9c055d588fa21ba78996776ea6c40bc428b53c62b5f3ccf200f647a5aae8067f0ea1976391fcc72af1945100e2a6dcb88", }, { inputLen: 7169, hash: "a003fc7a51754a9b3c7fae0367ab3d782dccf28855a03d435f8cfe74605e781798a8b20534be1ca9eb2ae2df3fae2ea60e48c6fb0b850b1385b5de0fe460dbe9d9f9b0d8db4435da75c601156df9d047f4ede008732eb17adc05d96180f8a73548522840779e6062d643b79478a6e8dbce68927f36ebf676ffa7d72d5f68f050b119c8", keyedHash: "ed9b1a922c046fdb3d423ae34e143b05ca1bf28b710432857bf738bcedbfa5113c9e28d72fcbfc020814ce3f5d4fc867f01c8f5b6caf305b3ea8a8ba2da3ab69fabcb438f19ff11f5378ad4484d75c478de425fb8e6ee809b54eec9bdb184315dc856617c09f5340451bf42fd3270a7b0b6566169f242e533777604c118a6358250f54", deriveKey: "554b0a5efea9ef183f2f9b931b7497995d9eb26f5c5c6dad2b97d62fc5ac31d99b20652c016d88ba2a611bbd761668d5eda3e568e940faae24b0d9991c3bd25a65f770b89fdcadabcb3d1a9c1cb63e69721cacf1ae69fefdcef1e3ef41bc5312ccc17222199e47a26552c6adc460cf47a72319cb5039369d0060eaea59d6c65130f1dd", }, { inputLen: 8192, hash: "aae792484c8efe4f19e2ca7d371d8c467ffb10748d8a5a1ae579948f718a2a635fe51a27db045a567c1ad51be5aa34c01c6651c4d9b5b5ac5d0fd58cf18dd61a47778566b797a8c67df7b1d60b97b19288d2d877bb2df417ace009dcb0241ca1257d62712b6a4043b4ff33f690d849da91ea3bf711ed583cb7b7a7da2839ba71309bbf", keyedHash: "dc9637c8845a770b4cbf76b8daec0eebf7dc2eac11498517f08d44c8fc00d58a4834464159dcbc12a0ba0c6d6eb41bac0ed6585cabfe0aca36a375e6c5480c22afdc40785c170f5a6b8a1107dbee282318d00d915ac9ed1143ad40765ec120042ee121cd2baa36250c618adaf9e27260fda2f94dea8fb6f08c04f8f10c78292aa46102", deriveKey: "ad01d7ae4ad059b0d33baa3c01319dcf8088094d0359e5fd45d6aeaa8b2d0c3d4c9e58958553513b67f84f8eac653aeeb02ae1d5672dcecf91cd9985a0e67f4501910ecba25555395427ccc7241d70dc21c190e2aadee875e5aae6bf1912837e53411dabf7a56cbf8e4fb780432b0d7fe6cec45024a0788cf5874616407757e9e6bef7", }, { inputLen: 8193, hash: "bab6c09cb8ce8cf459261398d2e7aef35700bf488116ceb94a36d0f5f1b7bc3bb2282aa69be089359ea1154b9a9286c4a56af4de975a9aa4a5c497654914d279bea60bb6d2cf7225a2fa0ff5ef56bbe4b149f3ed15860f78b4e2ad04e158e375c1e0c0b551cd7dfc82f1b155c11b6b3ed51ec9edb30d133653bb5709d1dbd55f4e1ff6", keyedHash: "954a2a75420c8d6547e3ba5b98d963e6fa6491addc8c023189cc519821b4a1f5f03228648fd983aef045c2fa8290934b0866b615f585149587dda2299039965328835a2b18f1d63b7e300fc76ff260b571839fe44876a4eae66cbac8c67694411ed7e09df51068a22c6e67d6d3dd2cca8ff12e3275384006c80f4db68023f24eebba57", deriveKey: "af1e0346e389b17c23200270a64aa4e1ead98c61695d917de7d5b00491c9b0f12f20a01d6d622edf3de026a4db4e4526225debb93c1237934d71c7340bb5916158cbdafe9ac3225476b6ab57a12357db3abbad7a26c6e66290e44034fb08a20a8d0ec264f309994d2810c49cfba6989d7abb095897459f5425adb48aba07c5fb3c83c0", }, { inputLen: 16384, hash: "f875d6646de28985646f34ee13be9a576fd515f76b5b0a26bb324735041ddde49d764c270176e53e97bdffa58d549073f2c660be0e81293767ed4e4929f9ad34bbb39a529334c57c4a381ffd2a6d4bfdbf1482651b172aa883cc13408fa67758a3e47503f93f87720a3177325f7823251b85275f64636a8f1d599c2e49722f42e93893", keyedHash: "9e9fc4eb7cf081ea7c47d1807790ed211bfec56aa25bb7037784c13c4b707b0df9e601b101e4cf63a404dfe50f2e1865bb12edc8fca166579ce0c70dba5a5c0fc960ad6f3772183416a00bd29d4c6e651ea7620bb100c9449858bf14e1ddc9ecd35725581ca5b9160de04060045993d972571c3e8f71e9d0496bfa744656861b169d65", deriveKey: "160e18b5878cd0df1c3af85eb25a0db5344d43a6fbd7a8ef4ed98d0714c3f7e160dc0b1f09caa35f2f417b9ef309dfe5ebd67f4c9507995a531374d099cf8ae317542e885ec6f589378864d3ea98716b3bbb65ef4ab5e0ab5bb298a501f19a41ec19af84a5e6b428ecd813b1a47ed91c9657c3fba11c406bc316768b58f6802c9e9b57", }, { inputLen: 31744, hash: "62b6960e1a44bcc1eb1a611a8d6235b6b4b78f32e7abc4fb4c6cdcce94895c47860cc51f2b0c28a7b77304bd55fe73af663c02d3f52ea053ba43431ca5bab7bfea2f5e9d7121770d88f70ae9649ea713087d1914f7f312147e247f87eb2d4ffef0ac978bf7b6579d57d533355aa20b8b77b13fd09748728a5cc327a8ec470f4013226f", keyedHash: "efa53b389ab67c593dba624d898d0f7353ab99e4ac9d42302ee64cbf9939a4193a7258db2d9cd32a7a3ecfce46144114b15c2fcb68a618a976bd74515d47be08b628be420b5e830fade7c080e351a076fbc38641ad80c736c8a18fe3c66ce12f95c61c2462a9770d60d0f77115bbcd3782b593016a4e728d4c06cee4505cb0c08a42ec", deriveKey: "39772aef80e0ebe60596361e45b061e8f417429d529171b6764468c22928e28e9759adeb797a3fbf771b1bcea30150a020e317982bf0d6e7d14dd9f064bc11025c25f31e81bd78a921db0174f03dd481d30e93fd8e90f8b2fee209f849f2d2a52f31719a490fb0ba7aea1e09814ee912eba111a9fde9d5c274185f7bae8ba85d300a2b", }, // // additional vectors that have not been verified with upstream. // they mainly exist to check that our results do not change for very large inputs. // { inputLen: 4 << 20, // 4 MB hash: "4e94e6f582581a0f3855f3ce504b153e951e65036fe9e2f010b7e25473c54f9837d7b96d9b118cc52d9355b3a29569cbc089752c10081c47bd92e4395e5c02189d2231f218722a0d99790d9c9b69355b0fd9ff5837128a14e369dbadf3eb8e0e1d127c3bb7d3346f57c45962b863a1e9a75d5178abfb0cbcb6e43c352fcd32eba985d2", keyedHash: "182b531d06d2705f68e23dc6a5580481f3342ded15cece016b58e0922e75c0e337b279c31c1108cb768b12a56289d53bc20fb9397d25b2dd58a4489ad24edc9f3f7ba9ea8da9b2a13813d7d0126f612269ce8f44cab5afd623c1bdbfe1d28f03ad1dd2e7afd3fa7249fabb4466c83b86e3a231912a7c320985f7200544558f9a74d4bf", deriveKey: "14689cc67a8329afabf4ddfb9c5bd23b910ffcc69fb59beb934f867608f1005a55b9f2cb7c44d358a2bf9158b4d6b0cb3d114b1f681f25ba5ef2c8a92789d0c44374f2629905ed4ffcdbdf652e1bd745635adbb280e0ba5aa2c7501266ce0ad558ebf576aa5bfc1b45db879bf680fde43ae56dcbe06f993eafc8a5effec9180da943e1", }, }