<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>In the case of vector registers, if it doesn't satisfy
the tall-register assumption, I feel that you can possibly
leave the input vector registers partially empty to solve
the problem?</p>
</div>
</blockquote>
</div>
</div>
<blockquote
cite="mid:CAGAvDYiVF0_=0V9sr_yzV=d87BSj_954i3ff9aWhhSqhVciqhQ@mail.gmail.com"
type="cite">
<div dir="auto">
<div dir="auto">You can, though I'm not sure of the costs. If
you use just half of the input vector registers (which might
mean you're just using the 128-bit SSE registers) you're
increasing the total vector reads you need to do, and you're
quadrupling the number of base case instances you have to
solve. Furthermore, I think you might need more vector
operations in each base case to manage the layout of the C
submatrix, making the base case even less efficient.</div>
</div>
</blockquote>
May not be that inefficient! Basically, you are doing redundant
computation. Say, 1st iteration, 1-4 elements in one AVX register,
2nd iteration, 3-7 elements in AVX, then 6-9, and so on.<br>
<blockquote
cite="mid:CAGAvDYiVF0_=0V9sr_yzV=d87BSj_954i3ff9aWhhSqhVciqhQ@mail.gmail.com"
type="cite">
<div dir="auto">
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<p>Best!</p>
<font color="#888888">
<p>-Yuan<br>
</p>
</font>
<div class="elided-text"> <br>
<div class="m_-248574004931951689moz-cite-prefix">On
2017年04月11日 07:56, TB Schardl wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="auto">
<div dir="ltr">Hey Charles,
<div><br>
</div>
<div>Still working though the paper, but it
got me thinking, and I realized that the
outer product base case we use for the
matrix-multiplication case study
essentially doesn't scale for their
problem.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Our code multiplies
subcolumns of matrix A by subrows of
matrix B to compute a submatrix of the
product matrix C. Basically, each
subcolumn or subrow is a full AVX
register, which is 256 bits. Because
we're multiplying matrices of doubles,
each of which are 64 bits, each subcolumn
or subrow stores 4 elements. The
resulting submatrix of C contains 16
elements, which can be (and needs to be)
completely stored in AVX registers.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Their matrix multiplication
problem, however, deals with 8-bit
integers. That means one AVX register
load stores 32 elements. So if we keep
using full AVX registers to store
subcolumns and subrow, one product between
a subcolumn of A and a subrow of B
produces elements in a 32x32 submatrix of
C. Storing that submatrix completely in
AVX registers, however, requires 32
registers, which is more than the number
of named AVX registers on the chip.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Of course there are other
base cases one could use, but the
outer-product base case and some
generalizations of it have problems when
the elements are so small. Cutting one of
the subrows or subcolumns in half (using
just half of the vector register) cuts the
size of the submatrix in half, but the
resulting submatrix still requires too
many AVX registers to store, particularly
when one accounts for the registers needed
to store the subcolumn, the subrow, and
their permutations. Cutting both the
subrow and subcolumn in half reduces the
submatrix size to a quarter its original
size, which can be fit in the available
registers. I'm not sure the resulting
code is particularly efficient, however,
given the bookkeeping it seems one needs
on the vector registers.</div>
<div dir="auto"><br>
</div>
<div>This issue reminds me of the reasoning
behind the tall-cache assumption. In
other words, it seems like we need the
number of vector registers to grow
quadratically with the number of elements
in each register in order to handle
problems like matrix multiplication.</div>
<div><br>
</div>
<div>Does all that make sense? Thoughts on
these thoughts?</div>
<div><br>
</div>
<div>Best,</div>
<div>TB</div>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Apr 10, 2017 2:24
AM, "Charles E. Leiserson" <<a
moz-do-not-send="true"
href="mailto:cel@mit.edu" target="_blank">cel@mit.edu</a>>
wrote:<br type="attribution">
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div dir="ltr">
<div class="gmail_default"
style="font-family:trebuchet
ms,sans-serif">Supertechies,<br>
<br>
</div>
<div class="gmail_default"
style="font-family:trebuchet
ms,sans-serif">Please see the attached
paper. Thoughts?<br>
<br>
</div>
<div class="gmail_default"
style="font-family:trebuchet
ms,sans-serif">Cheers,<br>
</div>
<div class="gmail_default"
style="font-family:trebuchet
ms,sans-serif">Charles<br clear="all">
</div>
<div>
<div
class="m_-248574004931951689m_2554045588306076500m_-6031687898788923739gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr"><span
style="font-family:trebuchet ms,sans-serif"><span
style="font-family:arial,helvetica,sans-serif"><br>
—</span><br>
Charles E.
Leiserson<br>
Edwin Sibley
Webster
Professor <br>
Margaret
MacVicar
Faculty Fellow<br>
</span>Professor
of <span
style="font-family:trebuchet
ms,sans-serif"><span
style="font-family:trebuchet ms,sans-serif">Computer Science and
Engineering<br>
<br>
Associate
Director and
Chief
Operating
Officer<br>
Computer
Science and
Artificial
Intelligence
Laboratory<br>
</span></span><span
style="font-family:trebuchet ms,sans-serif"><span
style="font-family:trebuchet
ms,sans-serif"><span
style="font-family:trebuchet ms,sans-serif">Massachusetts Institute of
Technology<br>
<br>
</span></span></span></div>
<div><span
style="font-family:trebuchet
ms,sans-serif"><span
style="font-family:trebuchet ms,sans-serif"><span
style="font-family:trebuchet
ms,sans-serif">Member
of the
National
Academy of
Engineering<br>
Fellow of
AAAS, ACM,
IEEE, and
SIAM.<br>
</span></span></span></div>
<div dir="ltr"><span
style="font-family:trebuchet ms,sans-serif"><br>
MIT CSAIL<br>
32 Vassar
Street,
#32-G768<br>
Cambridge, MA
02139<br>
USA<br>
<br>
MIT Phone: <a
moz-do-not-send="true" href="tel:%28617%29%20253-5833"
value="+16172535833"
target="_blank">+1-617-253-5833</a><br>
Web: <span
style="font-family:courier
new,monospace"><a
moz-do-not-send="true" href="http://people.csail.mit.edu/cel/"
target="_blank">http://people.csail.mit.edu/ce<wbr>l/</a></span><br>
Email: <span
style="font-family:courier new,monospace"><a moz-do-not-send="true"
href="mailto:cel@mit.edu"
target="_blank">cel@mit.edu</a></span></span><br>
—<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
Supertech mailing list<br>
<a moz-do-not-send="true"
href="mailto:Supertech@lists.csail.mit.edu"
target="_blank">Supertech@lists.csail.mit.edu</a><br>
<a moz-do-not-send="true"
href="https://lists.csail.mit.edu/mailman/listinfo/supertech"
rel="noreferrer" target="_blank">https://lists.csail.mit.edu/ma<wbr>ilman/listinfo/supertech</a><br>
<br>
</blockquote>
</div>
</div>
</div>
<br>
<fieldset
class="m_-248574004931951689mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Supertech mailing list
<a moz-do-not-send="true" class="m_-248574004931951689moz-txt-link-abbreviated" href="mailto:Supertech@lists.csail.mit.edu" target="_blank">Supertech@lists.csail.mit.edu</a>
<a moz-do-not-send="true" class="m_-248574004931951689moz-txt-link-freetext" href="https://lists.csail.mit.edu/mailman/listinfo/supertech" target="_blank">https://lists.csail.mit.edu/<wbr>mailman/listinfo/supertech</a>
</pre>
</blockquote>
</div></div>
______________________________<wbr>_________________
Supertech mailing list
<a moz-do-not-send="true" href="mailto:Supertech@lists.csail.mit.edu">Supertech@lists.csail.mit.edu</a>
<a moz-do-not-send="true" href="https://lists.csail.mit.edu/mailman/listinfo/supertech" rel="noreferrer" target="_blank">https://lists.csail.mit.edu/<wbr>mailman/listinfo/supertech</a>
</blockquote></div>
</div></div></div>
</blockquote>
</body></html>