https://jbeekman.nl/English – Technology & Policy2017-03-01T12:00:00ZJethro Beekmanhttps://jbeekman.nl/bloghttps://jbeekman.nl/favicon.icotag:jbeekman.nl,2017-03-01:/blog/2017/03/sgx-side-channel-attacks/On the recent side-channel attacks on Intel SGX – Technology & Policy2017-03-01T12:00:00Z2017-03-01T12:00:00Z<p><a href="https://en.wikipedia.org/wiki/Side-channel_attack">Side-channel attacks</a> are
my favorite attack in computer security because they poke giant holes in the
abstraction and security models that system designers are using. When I hear of
a new attack avenue in this space, my first reaction often is “wow, that is
<em>so</em> cool” and “I didn’t even think about that.”</p>
<p>In the past week, <a href="https://arxiv.org/abs/1702.07521">two</a>
<a href="https://arxiv.org/abs/1702.08719">papers</a> have been published on arXiv
detailing side-channel attacks on <a href="https://software.intel.com/en-us/sgx">Intel
SGX</a>. While the existence of such attacks
should be taken seriously by people designing systems using Intel SGX (which
includes yours truly), these particular attacks are not very interesting.</p>
<p>First of all, these attacks really shouldn’t come as a surprise to anyone
following this space. Cache-based side-channel attacks are a well-known attack
vector. Many papers detailing new techniques have been published over the last
couple of years. There’s <a href="http://www.cs.tau.ac.il/~tromer/papers/cache.pdf">Evict+Time and
Prime+Probe</a>,
<a href="https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-yarom.pdf">Flush+Reload</a>,
<a href="https://arxiv.org/abs/1511.04594">Flush+Flush</a>, etc. <a href="https://eprint.iacr.org/2016/613">Ge et.
al.</a> present a good overview of the current
state of the art. Intel explicitly states in their <a href="https://software.intel.com/sites/default/files/managed/ae/48/Software-Guard-Extensions-Enclave-Writers-Guide.pdf">Enclave Writer’s
Guide</a>
that they don’t protect against attacks at cache line or higher granularity.</p>
<p>Second of all, both attacks are exploiting well-known flaws in modular
exponentiation implementations. We <em>know</em> how to do constant-time RSA. On top
of that, both papers are at best slightly misleading in describing their attack
targets.</p>
<p>From the <a href="https://arxiv.org/abs/1702.07521">first paper</a>:</p>
<blockquote>
<p>As our victim enclave we chose an RSA implementation from the Intel IIP
crypto library in the Intel SGX SDK. The attacked decryption variant is a
fixed-size sliding window exponentiation, the code is available online at
<a href="https://github.com/01org/linux-sgx/blob/6662022/external/crypto_px/sources/ippcp/src/pcpngrsamontstuff.c#L336">[32]</a>.
The Intel IIP library includes also a variant of RSA that is hardened against
cache attacks
<a href="https://github.com/01org/linux-sgx/blob/6662022/external/crypto_px/sources/ippcp/src/pcpngrsamontstuff.c#L438">[33]</a>.</p>
</blockquote>
<p>If you look at how these two variants are used, you can see that only
<a href="https://github.com/01org/linux-sgx/blob/6662022/external/crypto_px/sources/ippcp/src/pcpngrsaencodec.c#L180">computations with the public
exponent</a>
are done with the “vulnerable” variant, whereas <a href="https://github.com/01org/linux-sgx/blob/6662022/external/crypto_px/sources/ippcp/src/pcpngrsaencodec.c#L265">computations with the private
exponent</a>
use the “hardened” variant. So, unless you are somehow swapping your public and
private exponents, using this crypto library as documented will prevent this
attack for you.</p>
<p>From the <a href="https://arxiv.org/abs/1702.08719">second paper’s abstract</a>:</p>
<blockquote>
<p>We perform a Prime+Probe cache side-channel attack on a co-located SGX
enclave running an up-to-date RSA implementation that uses a constant-time
multiplication primitive.</p>
</blockquote>
<p>Even though the library they use might have a multiplication primitive that is
constant-time, as the authors explain further on in the paper, the modular
exponentiation primitive is not. In fact, the modular exponentiation algorithm
is the textbook example of an algorithm with a secret-dependent branch:</p>
<div class="language-c++ highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span style="color:#0a8;font-weight:bold">int</span> modexp(<span style="color:#0a8;font-weight:bold">int</span> base, <span style="color:#0a8;font-weight:bold">int</span> exponent, <span style="color:#0a8;font-weight:bold">int</span> modulus) {
<span style="color:#0a8;font-weight:bold">int</span> result = <span style="color:#00D">1</span>;
<span style="color:#080;font-weight:bold">for</span> (<span style="color:#0a8;font-weight:bold">int</span> i = <span style="color:#00D">0</span>; i < exponent.bits(); i++) {
result = modsqr(result, modulus);
<span style="color:#080;font-weight:bold">if</span> (exponent & (<span style="color:#00D">1</span><<i)) { <span style="color:#777">// access bit `i`</span>
result = modmul(result, base, modulus);
}
}
<span style="color:#080;font-weight:bold">return</span> result;
}
</pre></div>
</div>
</div>
<p>If for each iteration of the loop you can detect whether the multiplication
happenned or not, you can reconstruct the individual bits of the exponent. This
is a fun attack, but as mentioned, this is basically the most well-known timing
attack, <a href="https://www.eng.tau.ac.il/~yash/infosec-seminar/2005/kocher95.pdf">first described by Kocher 22 years
ago</a>. Oh and
the library? It implements the blinding mitigation also mentioned in that
seminal paper.</p>
<p>To summarize: Yes, programs running with Intel SGX are vulnerable to
side-channel attacks. The same side-channel attacks that have been used for
years on modern x86 platforms. This is well-documented. SGX does present a
slightly different threat model which makes deployment of side-channel attacks
more likely. Hopefully everyone using SGX is implementing countermeasures. They
do exist, and are already implemented by most cryptography libraries.</p>
<p><a href="https://en.wikipedia.org/wiki/Side-channel_attack">Side-channel attacks</a> are
my favorite attack in computer security because they poke giant holes in the
abstraction and security models that system designers are using. When I hear of
a new attack avenue in this space, my first reaction often is “wow, that is
<em>so</em> cool” and “I didn’t even think about that.”</p>tag:jbeekman.nl,2016-03-09:/blog/2016/03/new-opt-stem-extension/New OPT STEM extension rules – Technology & Policy2016-03-09T12:00:00Z2016-03-09T12:00:00Z<p>Today, the <a href="https://www.ice.gov/news/releases/sevp-stem-opt">U.S. Department of Homeland Security
announced</a> that the <a href="https://s3.amazonaws.com/public-inspection.federalregister.gov/2016-04828.pdf">new rules
regarding the “Optional Practical Training STEM
extension”</a>
for international students in the U.S. on F-1 visas will be published in the
Federal Register on Friday. The new rules are a result of a lawsuit against DHS
that would result in an invalidation of the old rules. The new rules are
“better” in some respects and “worse” in others. In this blog post I will
review some of the most important changes and how they will affect
international students.</p>
<p><strong>Update 2016-03-09:</strong> A previous version of this post suggested that one could
get two consecutive STEM extensions. Careful reading of the rule counters this,
as does an explicit comment in the supplementary information: “DHS clarifies
that the final rule, as with the proposed rule, does not allow students to
obtain back-to-back STEM OPT extensions.”</p>
<h3 id="reasons-for-the-stem-extension">Reasons for the STEM extension</h3>
<p>The DHS is very clear in the rules and the accompanying comments that the goal
of OPT is practical training for students, and not to bridge a gap in the U.S.
labor force. They see practical training as a vital part of a good education,
and providing OPT is a key mechanism in staying competitive in the
international education market. Therefore, they basically disregarded any
comments that talk about a shortage or overage in U.S. STEM workers. There is
however a provision that employers can’t use a student on the STEM extension to
replace a U.S. worker, and also that students need to be paid the same as other
workers in similar positions. These requirements seem to be designed to assuage
fears of “cheap foreign labor taking our jobs.”</p>
<p>Because the justification is practical training, you might ask why there is
this specific extension for STEM workers, and not just a general length
increase for all OPT. DHS clarifies this as follows:</p>
<blockquote>
<p>[…] because of the specific nature of [STEM students’s] studies and
fields and the increasing need for enhancement of STEM skill application
outside of the classroom. DHS also found, as noted previously, that unlike
post-degree training in many non-STEM fields, training in STEM fields often
involves multi-year research projects as well as multi-year grants from
institutions such as the NSF.</p>
<p>Many STEM OPT practical training opportunities are research related, as
indicated by the fact that the employer that retains the most STEM OPT
students is the University of California system and that two other
universities are among the top six of such employers (Johns Hopkins
University and Harvard University).</p>
</blockquote>
<h3 id="basics">Basics</h3>
<p>First up: some straightforward changes. The length of the STEM extension is now
2 full years instead of the 17 months under the old rule. The “cap-gap
extension”—where OPT is automatically extended if the student has an approved
petition for an H-1B visa—is maintained. Students are now allowed to be
unemployed for a maximum total of 150 days during their initial OPT and their
STEM extension.</p>
<h3 id="training-plan-for-stem-opt-students">Training Plan for STEM OPT Students</h3>
<p>One of the big changes to the STEM extension is that students and employers now
need to come up with a training plan in order to get the extension. Students
must fill out a new Form I-983, “Training Plan for STEM OPT Students,” together
with their employer and file that with their STEM extension request.</p>
<p>The new article 8 CFR 214.2(f)(10)(ii)(C)(7) says:</p>
<blockquote>
<p>The training plan described in the Form I-983 […] form must identify goals
for the STEM practical training opportunity, including specific knowledge,
skills, or techniques that will be imparted to the student, and explain how
those goals will be achieved through the work-based learning opportunity with
the employer; describe a performance evaluation process; and describe methods
of oversight and supervision.</p>
</blockquote>
<p>The form is not available at the time of writing, but there is what seems to be
a <a href="http://www.reginfo.gov/public/do/DownloadDocument?objectID=62195800">draft of the instructions accompanying the
form</a>. The
relevant parts of the instructions basically echo the rule above.</p>
<p>At this time it’s not very clear how detailed the plan needs to be. Can
applicants just write down some business lingo (e.g. “improving core
competencies”) or do the goals need to be more substantial? Under the current
rules, it seems that the “Designated School Official” (i.e. someone who works
at your school’s international office) needs to gauge whether the proposed plan
meets the regulatory requirements.</p>
<p>Once the extension is approved, students will need to evaluate themselves
annually, according to the plan. Their employers will need to sign their
evaluation as well.</p>
<p>There are just too many unknowns at this point to really know how this will
affect the types of jobs international students will be able to get and how
this will affect the amount of time they have to spend on things besides their
normal job responsibilities.</p>
<h3 id="employer-employee-relationship">Employer-employee relationship</h3>
<p>While the actual rules text doesn’t seem to touch on this, the comments and
clarifications accompanying the rules suggest that it might be harder to work
in an “unusual” work arrangements, such as start-up companies:</p>
<blockquote>
<p>There are several aspects of the STEM OPT extension that do not make it apt
for certain types of arrangements, including multiple employer arrangements,
sole proprietorships, employment through “temp” agencies, employment through
consulting firm arrangements that provide labor for hire, and other
relationships that do not constitute a bona fide employer-employee
relationship.</p>
</blockquote>
<p>One of these aspects seems to be that someone else at the same company you work
for needs to sign your Form I-983:</p>
<blockquote>
<p>[…] students cannot qualify for STEM OPT extensions unless they will be
bona fide employees of the employer signing the Training Plan, and the
employer that signs the Training Plan must be the same entity that employs
the student and provides the practical training experience.</p>
</blockquote>
<p>But:</p>
<blockquote>
<p>STEM OPT extensions may be employed by new “start-up” businesses so long as
all regulatory requirements are met, including that the employer adheres to
the training plan requirements, remains in good standing with E-Verify, will
provide compensation to the STEM OPT student commensurate to that provided to
similarly situated U.S. workers, and has the resources to comply with the
proposed training plan.</p>
</blockquote>
<p>and</p>
<blockquote>
<p>[…] any ownership interest in the employer entity (such as stock options),
[must be] commensurate with the compensation provided to other similarly
situated U.S. workers.</p>
</blockquote>
<p>So while the rules seem to prevent running a sole proprietorship under the STEM
extension, it seems entirely feasible that another employee at your startup can
fulfill all the supervisory training requirements. If you’re starting out on
your own, you could use your first year of regular OPT to get your company off
the ground, and hopefully by the time you need to file for the STEM extension
you have a co-founder or such who can provided the necessary “training.”</p>
<h3 id="summary">Summary</h3>
<p>I’m pretty positive in general about the new rules, and am of course glad that
DHS acted swiftly to make new rules after the court decision in August. I’m
somewhat concerned about the rules around the training plan, but we’ll see how
that’s going to work out in practice. Everything else seems to be an
improvement (big or small) upon the previous rules.</p>
<p>Today, the <a href="https://www.ice.gov/news/releases/sevp-stem-opt">U.S. Department of Homeland Security
announced</a> that the <a href="https://s3.amazonaws.com/public-inspection.federalregister.gov/2016-04828.pdf">new rules
regarding the “Optional Practical Training STEM
extension”</a>
for international students in the U.S. on F-1 visas will be published in the
Federal Register on Friday. The new rules are a result of a lawsuit against DHS
that would result in an invalidation of the old rules. The new rules are
“better” in some respects and “worse” in others. In this blog post I will
review some of the most important changes and how they will affect
international students.</p>tag:jbeekman.nl,2015-10-13:/blog/2015/10/intel-has-full-control-over-sgx/Intel has full control over SGX – Technology & Policy2015-10-13T12:00:00Z2015-10-13T12:00:00Z<p>Intel has full control over what software you can run in SGX. This might seem
redundant: Intel makes the processor, so of course they have full control. Yet
the truth is slightly more inconvenient. When Intel processors don’t run the
instructions in your standard software (whether incorrectly or at all), that is
a defect at best and a breach of contract at worst. Yet the SGX instruction set
<em>includes in its specification</em> that Intel has the authority to make this
go/no-go decision.</p>
<p>Let’s take a closer look at how exactly this is specified, since it is pretty
well-hidden. After creating and measuring a secure enclave using ECREATE, EADD,
and EEXTEND, the EINIT instruction needs to be executed before execution
control can be transferred to the enclave. The EINIT instruction has 2 inputs:
SIGSTRUCT and EINITTOKEN. SIGSTRUCT contains information about the enclave
including an expected hash of the memory. As the name implies, SIGSTRUCT is
also cryptographically signed using some key. EINITTOKEN also contains
information about the enclave including the same expected hash of the memory as
well as the expected public key for the signature. EINITTOKEN must be
<a href="https://en.wikipedia.org/wiki/Message_authentication_code">MAC</a>ed using the
so-called <em>launch key</em>. Both SIGSTRUCT and EINITTOKEN are checked by EINIT and
must be valid for execution to proceed succesfully.</p>
<p>Since the launch key is a symmetric cryptography device, surely this key is not
widely distributed and most likely is CPU-specific. But how can one obtain this
key? The EGETKEY instruction can be used to obtain SGX keys, including the
launch key. But this is a user-mode instruction that can only be executed from
inside an enclave. There seems to be a chicken-and-egg problem here: to launch
an enclave, we need the launch key. To get the launch key, we need to launch an
enclave! Here’s the catch: the EINITTOKEN need not be valid if SIGSTRUCT is
signed by an Intel key that is baked into the processor.</p>
<p>Thus, Intel can distribute an Intel-signed “launch enclave” that is able to
hand out correctly-MACed EINITTOKENs that can then be used to start other
enclaves. But they can include whatever logic they want in the launch enclave
so Intel can at its sole discretion choose not to MAC a particular EINITTOKEN.</p>
<p>As most things SGX, this “feature” is severely underdocumented. The terms
“launch key” and “launch enclave” are only mentioned a few times in the SGX
programming reference and never in the whitepapers or tutorials. At the time of
writing, nowhere else on the Internet is there any mention of these keywords,
except for one <a href="https://www.quora.com/What-are-some-good-uses-for-Intel-Software-Guard-Extensions-SGX">insightful Quora
answer</a>
that I wish I had read months ago.</p>
<p>What reason could Intel have for this architecture? Along with the fact that
<a href="/blog/sgx-hardware-first-look">SGX is being disabled by default</a>, this looks
like Intel is again just setting this security technology up for failure due to
the lack of widespread adoption by developers and users alike (cf. TXT, SMX,
TPM).</p>
<p>Intel has full control over what software you can run in SGX. This might seem
redundant: Intel makes the processor, so of course they have full control. Yet
the truth is slightly more inconvenient. When Intel processors don’t run the
instructions in your standard software (whether incorrectly or at all), that is
a defect at best and a breach of contract at worst. Yet the SGX instruction set
<em>includes in its specification</em> that Intel has the authority to make this
go/no-go decision.</p>tag:jbeekman.nl,2015-10-08:/blog/2015/10/sgx-hardware-first-look/SGX Hardware: A first look – Technology & Policy2015-10-08T12:00:00Z2015-12-09T12:00:00Z<p>Without much fanfare, Intel has released <a href="https://software.intel.com/en-us/isa-extensions/intel-sgx">Software Guard Extensions
(SGX)</a> in Skylake.
When I say “without much fanfare,” I mean practically only the following
paragraph hidden on page 3 of a <a href="http://download.intel.com/newsroom/kits/core/6thgen/pdfs/6th_Gen_Intel_Core-Intel_Xeon_Factsheet.pdf">press fact
sheet</a>:</p>
<blockquote>
<p><strong>BETTER SECURITY.</strong> The Skylake architecture has been designed to enable better
security, including Intel® Software Guard Extensions (Intel® SGX) that can
provide an additional level of hardware-based protection by putting data into
a secure container on the platform, and Intel® Memory Protection Extensions
(Intel® MPX) that can help prevent buffer flow attacks. [What’s a buffer flow
attack? <em>Ed.</em>] To be fully utilized, Intel SGX and Intel MPX require additional
software capabilities, which will begin to be delivered by the ecosystem
later this year.</p>
</blockquote>
<p>It has been extremely difficult to find actual hardware that supports SGX. BIOS
support is required–the BIOS needs to set aside memory for the Enclave Page
Cache (EPC)–but of course no vendor will mention anything about this on their
website, nor will they (be able to) answer when you inquire regarding this
specific issue.</p>
<p>To my delight, by using Google to search past week results for “intel sgx” for
the last few months, I was finally able to find a driver download site that
linked this <a href="http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=F84XC">Dell
driver</a>.
According to Dell’s website, this driver is compatible with the following machine models:</p>
<ul>
<li>Inspiron 11 i3153</li>
<li>Inspiron 11 i3158</li>
<li>Inspiron 13 i7353</li>
<li>Inspiron 13 i7359</li>
<li>Inspiron 15 i7568</li>
</ul>
<p>At first I couldn’t find these models mentioned anywhere, but a few days later
the i7359 showed up at
<a href="http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&Description=dell+i7359">NewEgg</a>
and then at <a href="http://frys.com/search?query_string=dell+i7359">Frys</a>. So, I drove
to Sunnyvale (where Frys had the i7359-2435SLV in stock) and I can now confirm
that <strong>SGX is real</strong>:</p>
<div style="width:100%;text-align:center">
<a href="/img/blog/sgx-hardware-first-look/bios1.jpeg"><img src="/img/blog/sgx-hardware-first-look/bios1-small.jpeg" style="width:45%;margin-right:3.3%" /></a>
<a href="/img/blog/sgx-hardware-first-look/bios2.jpeg"><img src="/img/blog/sgx-hardware-first-look/bios2-small.jpeg" style="width:45%" /></a>
</div>
<p>It’s interesting to note that SGX was disabled in the BIOS by default, so most
consumers will not be able to benefit from this feature at all.</p>
<p>The maximum size of the EPC on this laptop is 128MB. This means that enclaves
requiring more memory than that will need regular paging between the EPC and
main memory. It’s not clear whether such a copy would require re-encryption of
the page, the EPC itself is already encrypted so it might not be necessary.</p>
<p>The laptop comes with Windows 10 which runs excruciatingly slow–not surprising
considering it only has 4GB of RAM. I installed Arch Linux on it because it’s
one of the few distro’s that has an installer with a very recent kernel
(4.2.2), required for such new hardware.</p>
<p>I collected some CPUID and MSR information to see what SGX features are
supported:</p>
<blockquote>
<p><strong>CPUID</strong></p>
<pre><code> bit 2 (SGX) is set
v
7h(0h) 00000000h 029c67afh 00000000h 00000000h
Max. enclave size 2^31 bytes (32-bit mode)
Max. enclave size 2^36 bytes (64-bit mode) |
No extended SSA features supported | |
SGX version 1 supported | |\/|
v v vvvv
12h(0h) 00000001h 00000000h 00000000h 0000241fh
all enclave attributes supported
|\ all XSAVE bits supported
vv vv
12h(1h) 00000036h 00000000h 0000001fh 00000000h
EPC physical address 80200000h
////| |\\\\\\\ EPC size 93.5 MiB
vvvvv vvvvvvvv vvvvv vvvvvvvv
12h(2h) 80200001h 00000000h 05d80001h 00000000h
</code></pre>
<p><strong>MSR</strong></p>
<pre><code> bit 18 (SGX_ENABLE) is set
| bit 0 (LOCK) is set
v v
3ah 00000000_00040005h (IA32_FEATURE_CONTROL)
</code></pre>
</blockquote>
<p>I’m currently writing a simple Linux kernel driver to be able to actually use
SGX. I managed to generate a Page Fault using the <code>ENCLS[EBLOCK]</code> instruction,
so at least something seems to be working.</p>
<p>I really wish Intel would be more forthcoming with information about and
developer support for SGX. The hardly-announced release and default-disabled
BIOS setting don’t warrant much hope for the future of SGX.</p>
<p>In the mean time, I intend to write more blog posts in the near future as I try
to get SGX up and running. Here’s a cliff hanger for you: the Dell driver
package mentioned earlier contains a file <code>aesm_service.exe</code> that contains the
string “SGX EPID provisioning network failure.” I’ll try to tell you more about
it next time.</p>
<p><strong>Update 2015-12-09:</strong> Please see my
<a href="https://github.com/jethrogb/sgx-utils">sgx-utils</a> repository for any
open-source SGX utilities, including a bare-bones development Linux driver.</p>
<p>Without much fanfare, Intel has released <a href="https://software.intel.com/en-us/isa-extensions/intel-sgx">Software Guard Extensions
(SGX)</a> in Skylake.
When I say “without much fanfare,” I mean practically only the following
paragraph hidden on page 3 of a <a href="http://download.intel.com/newsroom/kits/core/6thgen/pdfs/6th_Gen_Intel_Core-Intel_Xeon_Factsheet.pdf">press fact
sheet</a>:</p>tag:jbeekman.nl,2015-05-20:/blog/2015/05/ssh-logjam/On OpenSSH and Logjam – Technology & Policy2015-05-20T12:00:00Z2015-05-20T12:00:00Z<p><a href="https://weakdh.org/">Recent work</a> showing the feasibility of calculating
discrete logarithms on large integers has put the Diffie-Hellman key exchange
parameters we use every day in the spotlight. I have looked at what this means
for SSH key exchange. In short, on your <strong>SSH server</strong>, do the following:</p>
<pre><code>awk '{ if ($5 <= 2000) printf "#"; print }' /etc/ssh/moduli > /tmp/large_moduli
mv /tmp/large_moduli /etc/ssh/moduli
</code></pre>
<p>And put the following in your <code>sshd_config</code>:</p>
<pre><code>KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp256,
ecdh-sha2-nistp384,ecdh-sha2-nistp521,
diffie-hellman-group14-sha1,
diffie-hellman-group-exchange-sha1,
diffie-hellman-group-exchange-sha256
</code></pre>
<p>Note that <code>curve25519-sha256@libssh.org</code> is only supported in <a href="http://www.openssh.com/txt/release-6.5">OpenSSH
6.5</a> and up, and only works reliably in
<a href="http://www.openssh.com/txt/release-6.7">OpenSSH 6.7</a> and up. On your <strong>SSH
client</strong>, put the following in your <code>ssh_config</code>:</p>
<pre><code>KexAlgorithms curve25519-sha256@libssh.org,ecdh-sha2-nistp256,
ecdh-sha2-nistp384,ecdh-sha2-nistp521,
diffie-hellman-group14-sha1
</code></pre>
<p>If with this configuration you are unable to connect to some SSH servers, and
you need to add <code>diffie-hellman-group-exchange-sha1</code> or
<code>diffie-hellman-group-exchange-sha256</code> to the supported list of algorithms, you
should recompile your SSH client with a <code>DH_GRP_MIN</code> of 2048, so that a server
can’t force your client to use a weak group.</p>
<h3 id="technical-details">Technical details</h3>
<p>Now follows a detailed explanation of these recommendations. The following key
exchange mechanisms are supported in the current version (6.8) of OpenSSH:</p>
<ul>
<li><code>curve25519-sha256@libssh.org</code></li>
<li><code>ecdh-sha2-nistp256</code></li>
<li><code>ecdh-sha2-nistp384</code></li>
<li><code>ecdh-sha2-nistp521</code></li>
<li><code>diffie-hellman-group1-sha1</code></li>
<li><code>diffie-hellman-group14-sha1</code></li>
<li><code>diffie-hellman-group-exchange-sha1</code></li>
<li><code>diffie-hellman-group-exchange-sha256</code></li>
</ul>
<p>The first four mechanisms, <code>curve25519-sha256@libssh.org</code>,
<code>ecdh-sha2-nistp256</code>, <code>ecdh-sha2-nistp384</code>, <code>ecdh-sha2-nistp521</code>, do not use
prime-field Diffie-Hellman and are not affected. <a href="http://nmav.gnutls.org/2011/12/price-to-pay-for-perfect-forward.html">Previous
work</a>
shows that these mechanisms are much faster when used at the same security
level, so you should use them!</p>
<p>The <code>diffie-hellman-group1-sha1</code> mechanism uses the fixed 1024-bit <a href="https://www.ietf.org/rfc/rfc2409.txt">Oakley
Group 2</a> (not the 768-bit group 1, as the
name of the mechanism might suggest). This group is within the range of being a
viable target for nation-state attackers, and should not be used.</p>
<p>The <code>diffie-hellman-group14-sha1</code> mechanism uses the fixed 2048-bit <a href="https://www.ietf.org/rfc/rfc3526.txt">Oakley
Group 14</a>, which should be secure enough
for now.</p>
<p>The <code>diffie-hellman-group-exchange-sha1</code> and
<code>diffie-hellman-group-exchange-sha256</code> mechanisms let the client and server
negotiate a custom DH group. The client sends a tuple «min, n, max» to the
server, indicating the client’s minimum, preferred and maximum group size.
<a href="https://www.ietf.org/rfc/rfc4419.txt">According to the RFC</a>,</p>
<blockquote>
<p>Servers and clients SHOULD support groups with a modulus length of k
bits, where 1024 <= k <= 8192. The recommended values for min and
max are 1024 and 8192, respectively.</p>
</blockquote>
<p>The OpenSSH server selects a suitable group from a pre-generated set of groups,
installed system-wide in <code>/etc/ssh/moduli</code> (falling back to <code>/etc/ssh/primes</code>),
using the <code>choose_dh</code> function in
<a href="https://github.com/openssh/openssh-portable/blob/master/dh.c"><code>dh.c</code></a>. In case
no suitable group is found, the code defaults to Oakley Group 14, which is
safe. A pre-generated set is <a href="https://github.com/openssh/openssh-portable/blob/master/moduli">distributed with the OpenSSH
source</a> and
many binary distributions and is infrequently changed. The group sizes
distributed with OpenSSH are 1024, 1536, 2048, 3072, 4096, 6144, and 8192 bits,
with about 30 groups per size. The OpenSSH-distributed 1024-bit groups are
well-known and within the range of being a viable target for nation-state
attackers, and as such should not be used.</p>
<p>It is possible to generate your own set of groups, in which case it would be
safer to use a 1024-bit group, but you might as well go for larger groups. The
<code>ssh-keygen</code> man page mentions that “It is important that … both ends of a
connection share common moduli.” That statement should not be interpreted as
“both server and client need to have the same moduli configured”, as the server
sends the chosen modulus to the client. As a case-in-point, the OpenSSH client
does not access the system-wide moduli file at all during connection setup.</p>
<p>Speaking about the client, it usually offers the RFC-specified minimum of 1024
bits. There is nothing preventing a server from using that value and offering a
well-known (and thus weak) group. So, a standard client shouldn’t use the
custom group key exchange mechanisms, unless there is a way to change the
minimum group size.</p>
<p><a href="https://weakdh.org/">Recent work</a> showing the feasibility of calculating
discrete logarithms on large integers has put the Diffie-Hellman key exchange
parameters we use every day in the spotlight. I have looked at what this means
for SSH key exchange. In short, on your <strong>SSH server</strong>, do the following:</p>tag:jbeekman.nl,2015-03-08:/blog/2015/03/lenovo-thinkpad-hdd-password/Lenovo ThinkPad HDD Password – Technology & Policy2015-03-09T00:00:01Z2015-03-09T00:00:01Z<p>Modern SSDs (at least the ones made by Intel, Samsung) always encrypt all
stored data using AES. The encryption key used is stored in nonvolatile memory
on the SSD. One of the reasons for this is that to securely wipe the drive now
you just need to overwrite the encryption key with a new random one. This way,
you don’t need to erase every flash block, which is very bad for durability
reasons. The encryption key can optionally be encrypted using a 32-byte
“security password”, the configuration of which is overloaded on the ATA
security feature set. If you trust the hardware manufacturer to actually
implement this securely, this would seem to provide a very solid and fast
option for encrypted persistent storage.</p>
<p>To be able to boot off of such an encrypted drive, it needs to be unlocked
before the OS’s bootloader can be read, which requires BIOS support. Luckily,
my Lenovo ThinkPad T420s does support this: you can configure a drive password
in BIOS the setup screen and from then on the BIOS will ask for a password upon
startup. Now here’s the catch: it turns out that when you take this drive and
put it in a different machine, it is impossible to unlock the drive. This would
mean that if my laptop dies but the drive were still intact <em>I would be unable
to access the data on the drive, even though I know the password!</em></p>
<p>A couple of weeks ago I finally decided to get to the bottom of this by reverse
engineering the Lenovo UEFI BIOS on my laptop. The goal was simple: to find the
code path from <em>password input</em> to <em>ATA security unlock output</em> and reproduce
it. I have <a href="/blog/2015/03/reverse-engineering-uefi-firmware/">detailed the reverse engineering process in another blog
post</a>. Here’s the
algorithm:</p>
<div class="MathJax_Preview">\textit{AtaPassword} \gets \textrm{SHA}_{256}\left( \textrm{SHA}_{256}(\textit{Password}) \parallel \textit{AtaIdentity}_\textit{SerialNumber} \parallel \textit{AtaIdentity}_\textit{ModelNumber} \right)</div>
<script type="math/tex; mode=display">\textit{AtaPassword} \gets \textrm{SHA}_{256}\left( \textrm{SHA}_{256}(\textit{Password}) \parallel \textit{AtaIdentity}_\textit{SerialNumber} \parallel \textit{AtaIdentity}_\textit{ModelNumber} \right)</script>
<p>The inputs are <span class="MathJax_Preview">\textit{Password}</span><script type="math/tex">\textit{Password}</script> which is the user-supplied password and
<span class="MathJax_Preview">\textit{AtaIdentity}</span><script type="math/tex">\textit{AtaIdentity}</script> which is the <a href="https://msdn.microsoft.com/en-us/library/windows/hardware/ff559006%28v=vs.85%29.aspx">ATA Identify Device data
structure</a>.
The output <span class="MathJax_Preview">\textit{AtaPassword}</span><script type="math/tex">\textit{AtaPassword}</script> gets sent to the drive. Why do they use
this algorithm? It’s actually somewhat clever: the S/N and M/N act as a salt,
such that a hash sniffed off of the ATA bus will only be able to unlock that
one drive, and not any other drives that use the same password.</p>
<p>That’s the good part. The bad part is that the algorithm above is not quite
complete. Here is the <em>actual</em> algorithm:</p>
<div class="MathJax_Preview">\textit{PasswordHash} \gets \textrm{SHA}_{256}\left( \left( \textrm{ToScanCodes}(\textrm{LowerCase}(\textit{Password})) \parallel ␀^{64} \right)_{1:64} \right)_{1:12}</div>
<script type="math/tex; mode=display">\textit{PasswordHash} \gets \textrm{SHA}_{256}\left( \left( \textrm{ToScanCodes}(\textrm{LowerCase}(\textit{Password})) \parallel ␀^{64} \right)_{1:64} \right)_{1:12}</script>
<div class="MathJax_Preview">\textit{SN} \gets \textit{AtaIdentity}_\textit{SerialNumber} \qquad \textit{MN} \gets \textit{AtaIdentity}_\textit{ModelNumber}</div>
<script type="math/tex; mode=display">\textit{SN} \gets \textit{AtaIdentity}_\textit{SerialNumber} \qquad \textit{MN} \gets \textit{AtaIdentity}_\textit{ModelNumber}</script>
<div class="MathJax_Preview">\textit{AtaPassword} \gets \textrm{SHA}_{256}\left( \textit{PasswordHash} \parallel \textrm{SwapBytes}(\textit{SN}) \parallel \textrm{SwapBytes}(\textit{MN}) \right)</div>
<script type="math/tex; mode=display">\textit{AtaPassword} \gets \textrm{SHA}_{256}\left( \textit{PasswordHash} \parallel \textrm{SwapBytes}(\textit{SN}) \parallel \textrm{SwapBytes}(\textit{MN}) \right)</script>
<p>The function <span class="MathJax_Preview">\textrm{ToScanCodes}</span><script type="math/tex">\textrm{ToScanCodes}</script> translates the characters
<code>1234567890qwertyuiopasdfghjkl;zxcvbnm␣</code> to integers in the ranges 2–11, 16–25,
30–39, 44–50, 57–57, respectively, while dropping other characters.
<span class="MathJax_Preview">\textrm{SHA}_{256}</span><script type="math/tex">\textrm{SHA}_{256}</script> is the well-known hash function. <span class="MathJax_Preview">\textrm{SwapBytes}</span><script type="math/tex">\textrm{SwapBytes}</script>
is the POSIX <code>swab</code> function, it swaps odd and even bytes.</p>
<p>There are a couple of peculiarities in the algorithm that reduce the security.
First of all, I’m not sure why the characters get converted into scancodes. The
UEFI BIOS is well-equiped to deal with keyboard layouts, so that just seems
unnecessary. It also reduces the entropy to only 5.3 bits per character, making
short passwords very insecure. What’s worse though, is that only 12 bytes of
the password hash are used, putting an upper bound of 96 bits on the entropy.
If your password is sampled uniformly at random from the available scancodes,
don’t bother making it longer than 18 characters.</p>
<p>The other weird thing is the <span class="MathJax_Preview">\textrm{SwapBytes}</span><script type="math/tex">\textrm{SwapBytes}</script> function. This means that
if your model number is
<code style="word-break:break-all">Samsung␣SSD␣840␣EVO␣500GB␣…</code>, that part of the
input to the hash function will be
<code style="word-break:break-all">aSsmnu␣gSS␣D48␣0VE␣O05G0␣B…</code>. Why is that?
Between the ATA Identify Device data structure being defined in terms of 16-bit
words and the UEFI specification using 16-bit wide characters, while the model
and serial number are encoded as 8-bit ASCII, I can only assume that someone
messed up some endianness conversion somewhere.</p>
<p>Today, am I <a href="https://github.com/jethrogb/lenovo-password">releasing a tool to unlock your
drive</a>. If despite all the
above—96 bits is more entropy than most passwords have—you still decide to use
the Lenovo BIOS to do your password management, you can use this to unlock your
drive in the event of hardware failure. You will need <code>hdparm</code> to talk to your
drive. If the password hash contains a ␀ character, you’ll need to patch
<code>hdparm</code> to be able to use that. I tested this on my own setup, but you may
want to verify it actually works before you start depending on it.</p>
<p>Modern SSDs (at least the ones made by Intel, Samsung) always encrypt all
stored data using AES. The encryption key used is stored in nonvolatile memory
on the SSD. One of the reasons for this is that to securely wipe the drive now
you just need to overwrite the encryption key with a new random one. This way,
you don’t need to erase every flash block, which is very bad for durability
reasons. The encryption key can optionally be encrypted using a 32-byte
“security password”, the configuration of which is overloaded on the ATA
security feature set. If you trust the hardware manufacturer to actually
implement this securely, this would seem to provide a very solid and fast
option for encrypted persistent storage.</p>tag:jbeekman.nl,2015-03-08:/blog/2015/03/reverse-engineering-uefi-firmware/Reverse Engineering UEFI Firmware – Technology & Policy2015-03-09T00:00:00Z2015-03-09T00:00:00Z<p>In order to <a href="/blog/2015/03/lenovo-thinkpad-hdd-password/">figure out how my BIOS drive password
worked,</a> I had to
reverse-engineer the firmware that comes with my laptop. You can find the
binary blobs on the update CD that Lenovo provides, and it turns out these
blobs are actually UEFI images. UEFI firmware is made up of many different
loadable modules (drivers, shared libraries, etc.), which are stored in the
Portable Executable (PE) image format. These modules can be extracted from the
image using Nikolaj Schlej’s excellent UEFIExtract (from
<a href="https://github.com/LongSoft/UEFITool">UEFITool</a>). Once you have all the PE
modules, the real reversing can begin.</p>
<p>It helps to understand how UEFI works. The Internet contains a wealth of
information, and here are two articles to get you started: <a href="http://mjg59.dreamwidth.org/18773.html">Getting started
with UEFI development</a> and <a href="http://x86asm.net/articles/uefi-programming-first-steps/">UEFI
Programming - First
Steps</a>. The main
problem that makes reverse engineering hard is that while the firmware consists
of over 300 loadable modules, there is no dynamic linker. Instead, the entry
point of a module gets passed an pointer to a “protocol” registry. A protocol
is basically an interface, or in other words a struct of function pointers. The
registry is keyed by Globally unique identifiers (GUIDs). To call into another
module, you need to lookup a GUID in the registry and then call some function
returned in the interface.</p>
<p>My first strategy to get some insight into the firmware was to collect GUIDs
from images and build a dependency graph. This turned out to be useless. The
UEFI image contains <em>PEI dependency sections</em> for each image, but the GUIDs
that are listed seem to have no relation to actually required protocols.
Furthermore, identifying GUIDs (also known as 16 random bytes) in binaries is
hard, and even when I manged to identify a section that seemed to store GUIDs,
there would be many GUIDs in such a section that were never referenced from
code in that image.</p>
<p>To figure out the dependencies, I decided to actually run the modules and see
which protocols they lookup and which ones they register. <em>Wait what, run UEFI
PE modules?</em> Yes, <a href="https://github.com/jethrogb/uefireverse/tree/master/efiperun">I wrote a tool called
<code>efiperun</code></a> that
can load PE modules into memory and simulate enough of what an UEFI environment
is supposed to look like to actually run them. Most modules will upon entry
lookup some standard protocols, do some initialization, and register one or
more protocols that other modules can use.</p>
<p>With this information in hand, you can do more targeted reversing, trying to
identify interfaces and function signatures. For example,
<code>LenovoTranslateService.efi</code> installs a protocol
<code>e3abb023-b8b1-4696-98e1-8eedc3d3c63d</code>. This protocol turns out to have the
following interface:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span style="color:#080;font-weight:bold">struct</span> interface_e3abb023_b8b1_4696_98e1_8eedc3d3c63d
{
<span style="color:#088;font-weight:bold">void</span>(EFIAPI *translate)(<span style="color:#088;font-weight:bold">void</span>* _this, <span style="color:#088;font-weight:bold">const</span> <span style="color:#0a8;font-weight:bold">char</span>* input, <span style="color:#0a8;font-weight:bold">char</span>* output, size_t length);
}
</pre></div>
</div>
</div>
<p>With <code>efiperun</code> you can actually write code that calls into loaded EFI modules,
which makes it easy to test installed interfaces. Utilizing this functionality,
I was able to determine that the <code>translate</code> function above actually translates
an ASCII string to keyboard scan codes.</p>
<p>When doing reverse engineering, you always end up exploring branches that turn
out to be less fruitful. But the knowledge obtained exploring such a branch can
be useful in exploring other ideas. Now that I’ve setup the stage with the
tools I’m going to use, I will describe the path that lead to the discovery of
the algorithm. Keep in mind that this is a reconstruction and the order in
which I actually figured parts out is different.</p>
<h4 id="graphical-entry-point">Graphical entry point</h4>
<p><img src="/img/blog/reverse-engineering-uefi-firmware/hdp-prompt.png" alt="Lenovo HDP Prompt" align="right" /></p>
<p>The Lenovo firmware does not make heavy use of graphical elements, but the Hard
Drive Password prompt actually does display a small pictogram, pictured on the
right. Now, judging by the filenames, there are only a few modules that deal with graphics:</p>
<ul>
<li><code>SystemGraphicsConsoleDxe.efi</code></li>
<li><code>SystemHiiImageDisplayDxe.efi</code></li>
<li><code>SystemImageDecoderDxe.efi</code></li>
<li><code>SystemImageDisplayDxe.efi</code></li>
</ul>
<p>All these modules install a single protocol that don’t use a <a href="https://github.com/jethrogb/uefireverse/blob/master/guiddb/efi_guid.c">well-known
GUID</a>,
so let’s see what modules call them. As it turns out, only
<code>SystemSplashDxe.efi</code> calls <code>SystemHiiImageDisplayDxe.efi</code>
(96ce4c12-55e4-4a1c-bbf3-73a5055fb364) and only <code>LenovoPromptService.efi</code> calls
<code>SystemImageDisplayDxe.efi</code> (71583a77-2789-4213-a83b-eef42afe85e0).
<code>SystemSplashDxe.efi</code> pretty much seems to be as advertised and even contains a
GIF file with the ThinkPad splash image. Upon further inspection,
<code>LenovoPromptService.efi</code> contains 21 BMP files, all related to displaying
password prompts. Bingo!</p>
<h4 id="password-control-program">Password control program</h4>
<p>The Prompt service installs a single protocol
56350810-2cb2-4aa0-96d2-66d1b8e1aac2 which is only called by
<code>LenovoPasswordCp.efi</code>. This module contains key code connecting various
password-related modules, and I’ll assume Cp means “control progam”. Besides
the prompt service (for text input), it also calls into
<code>LenovoSoundService.efi</code> (e01fc710-ba41-493b-a919-53583368f6d9, for beeping
noises when you press an invalid key), <code>LenovoTranslateService.efi</code> (described
above) and <code>LenovoCryptService.efi</code> (73e47354-b0c5-4e00-a714-9d0d5a4fdbfd,
supposedly a crypto module—see next section).</p>
<p>The password control program has an interesting function at offset <code>0x8cc</code> that
calls only <code>SetMem</code>, <code>CopyMem</code> and the Crypto and Translate services. Here’s
roughly the code for this function:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span style="color:#088;font-weight:bold">void</span> _0x8cc(<span style="color:#088;font-weight:bold">const</span> CHAR16 in[<span style="color:#00D">64</span>], UINT8 out[<span style="color:#00D">16</span>])
{
UINT8 ascii[<span style="color:#00D">64</span>], scancode[<span style="color:#00D">64</span>], hash[<span style="color:#00D">32</span>];
BootServices->SetMem(out,<span style="color:#00D">16</span>,<span style="color:#00D">0</span>);
BootServices->SetMem(ascii,<span style="color:#00D">64</span>,<span style="color:#00D">0</span>);
BootServices->SetMem(scancode,<span style="color:#00D">64</span>,<span style="color:#00D">0</span>);
BootServices->SetMem(hash,<span style="color:#00D">32</span>,<span style="color:#00D">0</span>);
<span style="color:#080;font-weight:bold">for</span> (<span style="color:#0a8;font-weight:bold">int</span> i=<span style="color:#00D">0</span>;i<<span style="color:#00D">64</span>;i++)
{
ascii[i]=in[i];
}
<span style="color:#080;font-weight:bold">if</span> (TranslateService)
{
TranslateService->Translate(TranslateService,ascii,scancode,<span style="color:#00D">64</span>);
<span style="color:#080;font-weight:bold">if</span> (CryptService)
{
CryptService->SHA256(CryptService,scancode,<span style="color:#00D">64</span>,hash);
BootServices->CopyMem(out,hash,<span style="color:#00D">16</span>);
}
BootServices->SetMem(ascii,<span style="color:#00D">64</span>,<span style="color:#00D">0</span>);
BootServices->SetMem(scancode,<span style="color:#00D">64</span>,<span style="color:#00D">0</span>);
BootServices->SetMem(hash,<span style="color:#00D">32</span>,<span style="color:#00D">0</span>);
}
<span style="color:#080;font-weight:bold">else</span>
{
BootServices->SetMem(ascii,<span style="color:#00D">64</span>,<span style="color:#00D">0</span>);
}
}
</pre></div>
</div>
</div>
<p>I’ll assume that this function is used to hash a password input by the user.
There’s another interesting function at offset <code>0xa30</code>, which checks whether the
input <code>CHAR16</code> is in the character class <code>[0-9A-Za-z ;]</code>, which is used to limit
the possible characters in the password input.</p>
<p>I’ve made good progress identifying part of the path from password input to
security unlock command, but here I’ve hit a dead end. It’s not really clear
from where the password control program gets called and what happens to the
hash it outputs. I’ll try a different approach next, but first let’s talk about
the crypto service.</p>
<h4 id="crypto-service">Crypto service</h4>
<p>The password control program calls a function in the Crypto service at offset
<code>0x26e0</code>, which references three GUIDs that I hadn’t seen before:</p>
<ul>
<li>69188a5f-6bbd-46c7-9c16-55f194befcdf</li>
<li>d0b3d668-16cf-4feb-95f5-1ca3693cfe56</li>
<li>6c48f74a-b4df-461f-80c4-5cae8a85b7ee</li>
</ul>
<p>These GUIDs do not appear in any <code>efiperun</code> output. Instead, I just searched
all images for appearances of these GUIDs, and they appear in 10 other images.
A noteworthy appearance is in <code>SystemCryptSvcRt.efi</code> at offset <code>0x1c70</code>. Offset
<code>0x1c70</code> is referenced at offset <code>0x330</code>, where it is immediately followed by
the unicode string “SHA256”. This is followed by a jump table at offset
<code>0x370</code>, which points to 3 jumps at offset <code>0x33c0</code> that jump to 3 functions at
offsets <code>0x753c</code>, <code>0x7570</code> and <code>0x760c</code>. The function at offset <code>0x753c</code>
references offset <code>0x2258</code>, <em>which stores the <a href="https://en.wikipedia.org/wiki/SHA-2#Pseudocode">hash initialization constants
for SHA256</a></em>! The rest of the
<code>SystemCryptSvcRt.efi</code> module also contains SHA256 round constants, and similar
strings and constants for other algorithms.</p>
<p>All in all this suggests that the Crypto service is a front for the
cryptographic routines in <code>SystemCryptSvcRt.efi</code> and that the password control
program calls SHA256. I wrote a small test program for the <a href="http://tianocore.sourceforge.net/wiki/Efi-shell">EFI shell</a> that
tests this:</p>
<div class="language-c highlighter-coderay"><div class="CodeRay">
<div class="code"><pre><span style="color:#088;font-weight:bold">void</span> buf2hexstr(VOID*buf,CHAR16*str,UINTN len)
{
UINTN i;
<span style="color:#088;font-weight:bold">static</span> CHAR16 hchars[<span style="color:#00D">16</span>]={<span style="color:#D20">'0'</span>,<span style="color:#D20">'1'</span>,<span style="color:#D20">'2'</span>,<span style="color:#D20">'3'</span>,<span style="color:#D20">'4'</span>,<span style="color:#D20">'5'</span>,<span style="color:#D20">'6'</span>,<span style="color:#D20">'7'</span>,<span style="color:#D20">'8'</span>,<span style="color:#D20">'9'</span>,<span style="color:#D20">'a'</span>,<span style="color:#D20">'b'</span>,<span style="color:#D20">'c'</span>,<span style="color:#D20">'d'</span>,<span style="color:#D20">'e'</span>,<span style="color:#D20">'f'</span>};
UINT8* buf_=(UINT8*)buf;
<span style="color:#080;font-weight:bold">for</span> (i=<span style="color:#00D">0</span>;i<len;i++)
{
*(str++)=hchars[*(buf_) >><span style="color:#00D">4</span>];
*(str++)=hchars[*(buf_++)&<span style="color:#02b">0xf</span>];
}
}
EFI_STATUS Initialize(...)
{
...
EFI_GUID guid={<span style="color:#02b">0x73e47354</span>,<span style="color:#02b">0xb0c5</span>,<span style="color:#02b">0x4e00</span>,{<span style="color:#02b">0xa7</span>,<span style="color:#02b">0x14</span>,<span style="color:#02b">0x9d</span>,<span style="color:#02b">0x0d</span>,<span style="color:#02b">0x5a</span>,<span style="color:#02b">0x4f</span>,<span style="color:#02b">0xdb</span>,<span style="color:#02b">0xfd</span>}};
<span style="color:#088;font-weight:bold">void</span>* intf;
<span style="color:#080;font-weight:bold">if</span> (SystemTable->BootServices->LocateProtocol(&guid,<span style="color:#069">NULL</span>,&intf)==EFI_SUCCESS)
{
<span style="color:#088;font-weight:bold">const</span> <span style="color:#0a8;font-weight:bold">char</span>* in=<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">TEST</span><span style="color:#710">"</span></span>;
<span style="color:#0a8;font-weight:bold">char</span> out[<span style="color:#00D">32</span>]={};
CHAR16 str[<span style="color:#00D">13</span>+(<span style="color:#00D">32</span>*<span style="color:#00D">2</span>)+<span style="color:#00D">2</span>]=L<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">SHA256 test: </span><span style="color:#710">"</span></span>;
((<span style="color:#088;font-weight:bold">void</span>(*)(<span style="color:#088;font-weight:bold">void</span>*,<span style="color:#088;font-weight:bold">const</span> <span style="color:#0a8;font-weight:bold">char</span>*,UINTN,<span style="color:#0a8;font-weight:bold">char</span>*))(*(<span style="color:#088;font-weight:bold">void</span>**)intf))(intf,in,<span style="color:#00D">4</span>,out);
buf2hexstr(out,str+<span style="color:#00D">13</span>,<span style="color:#00D">32</span>);
str[<span style="color:#00D">13</span>+(<span style="color:#00D">32</span>*<span style="color:#00D">2</span>)]=<span style="color:#D20">'\n'</span>;
SystemTable->ConOut->OutputString(SystemTable->ConOut, str);
SystemTable->ConOut->OutputString(SystemTable->ConOut,
L<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Expected: 94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2</span><span style="color:#b0b">\n</span><span style="color:#710">"</span></span>);
}
<span style="color:#080;font-weight:bold">else</span>
{
SystemTable->ConOut->OutputString(SystemTable->ConOut,
L<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Unable to load CryptService protocol</span><span style="color:#b0b">\n</span><span style="color:#710">"</span></span>);
}
...
}
</pre></div>
</div>
</div>
<p>Outputs:</p>
<pre><code>SHA256 test: 94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2
Expected: 94ee059335e587e501cc4bf90613e0814f00a7b08bc7c648fd865a2af6a22cc2
</code></pre>
<p>Success!</p>
<h4 id="hard-drive-communication">Hard-drive communication</h4>
<p>As mentioned, I discovered how the input password got hashed, but it still
needs to be sent to the drive. The UEFI standard defines the <em>ATA Pass Thru
Protocol</em>, which can be used to send raw ATA commands to a drive. This protocol
is very likely to be used for sending ATA security commands. This protocol is
not loaded upon initialization by any modules, but the GUID does appear in the
following modules:</p>
<ul>
<li><code>FdiskOem.efi</code></li>
<li><code>LenovoHdpManagerDxe.efi</code></li>
<li><code>LenovoMfgBenchEventDxe.efi</code></li>
<li><code>SystemAhciAtaAtapiPassThruDxe.efi</code></li>
<li><code>SystemAhciBusDxe.efi</code></li>
<li><code>SystemAhciBusSmm.efi</code></li>
<li><code>SystemIdeAtaPassThruDxe.efi</code></li>
<li><code>SystemIdeBusDxe.efi</code></li>
</ul>
<p>Wait a minute, is that second module called <em>Lenovo Hard Drive Password
Manager</em>? Why yes, it is. There’s a bunch of code in this module, but I found
an interesting function call chain for you:</p>
<ul>
<li>offset <code>0xce0</code>
<ul>
<li>offset <code>0x8a0</code>
<ul>
<li>CryptService.SHA256</li>
</ul>
</li>
<li>offset <code>0x144c</code>
<ul>
<li>offset <code>0x232c</code>
<ul>
<li>EFI_ATA_PASS_THRU_PROTOCOL.PassThru</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>The input to the SHA256 function is a parameter to the function at offset
<code>0xce0</code>, and data from an EFI runtime variable “LenovoHddSecInfoVar”. The
PassThru function is called with a ATA_OP_SECURITY_UNLOCK command block
including the hash generated just before. I assume the input to the function at
offset <code>0xce0</code> is the password hash from the password control program, but what
is the data in “LenovoHddSecInfoVar”? The <code>dmpstore</code> utility in the <a href="http://tianocore.sourceforge.net/wiki/Efi-shell">EFI shell</a>
that will dump runtime variables. Here’s mine:</p>
<pre><code>Variable BS '2D8FBE63-3A04-4EF8-A8A4-77321DB5A9AB:LenovoHddSecInfoVar' DataSize = 8
00000000: 98 7D BC B7 00 00 00 00- *........*
</code></pre>
<p>From the code I know that the value is being used as a memory address, so let’s
use the <code>mem</code> utility to dump that:</p>
<pre><code> B7BC7D98: .. .. .. .. .. .. .. ..-18 E0 0F B8 00 00 00 00 ........*
B7BC7DA8: 98 DF 0F B8 00 00 00 00-.. .. .. .. .. .. .. .. *........
</code></pre>
<p>Those are two more memory addresses, let’s see what’s there:</p>
<pre><code> B80FDF98: 61 53 73 6D 6E 75 20 67-53 53 20 44 34 38 20 30 *aSsmnu gSS D48 0*
B80FDFA8: 56 45 20 4F 30 35 47 30-20 42 20 20 20 20 20 20 *VE O05G0 B *
B80FDFB8: 20 20 20 20 20 20 20 20-.. .. .. .. .. .. .. .. *
B80FE018: 31 53 48 44 53 4E 46 41-30 42 38 35 39 34 20 45 *1SHDSNFA0B8594 E*
B80FE028: 20 20 20 20 .. .. .. ..-.. .. .. .. .. .. .. .. *
</code></pre>
<p>If you squint your eyes just right, those kind of read <code>Samsung SSD 840 EVO
500GB</code> and <code>S1DHNSAFB05849E</code>, the Model Number and Serial Number for my SSD,
respectively. Piecing all this together, you get the algorithm described in <a href="/blog/2015/03/lenovo-thinkpad-hdd-password/">my
other blog post</a>.</p>
<h4 id="conclusion">Conclusion</h4>
<p>As I mentioned, this story is the abridged version of how I found the password
hashing algorithm. In reality, I looked at many other modules, including many
hours spent looking at useless things. In the end though, I prevailed and found
what I was looking for, developing <a href="https://github.com/jethrogb/uefireverse">a bunch of
tools</a> in the process:</p>
<dl>
<dt>efiperun:</dt>
<dd>Load and run EFI PE image files in a regular OS environment.</dd>
<dt>guiddb:</dt>
<dd>Scan files for GUIDs and output them in C-source file format.</dd>
<dt>memdmp:</dt>
<dd>Dump UEFI memory using <a href="http://tianocore.sourceforge.net/wiki/Efi-shell">EFI shell</a>.</dd>
<dt>tree:</dt>
<dd>A Ruby abstraction for a firmware tree on your filesystem previously
extracted by <a href="https://github.com/LongSoft/UEFITool">UEFIExtract</a>.</dd>
</dl>
<p>I hope these tools are of use to anyone. Patches welcome. ☺️</p>
<p>In order to <a href="/blog/2015/03/lenovo-thinkpad-hdd-password/">figure out how my BIOS drive password
worked,</a> I had to
reverse-engineer the firmware that comes with my laptop. You can find the
binary blobs on the update CD that Lenovo provides, and it turns out these
blobs are actually UEFI images. UEFI firmware is made up of many different
loadable modules (drivers, shared libraries, etc.), which are stored in the
Portable Executable (PE) image format. These modules can be extracted from the
image using Nikolaj Schlej’s excellent UEFIExtract (from
<a href="https://github.com/LongSoft/UEFITool">UEFITool</a>). Once you have all the PE
modules, the real reversing can begin.</p>tag:jbeekman.nl,2015-01-28:/blog/2015/01/nibble-sort/Parallel Nibble Sort – Technology & Policy2015-01-28T12:00:00Z2015-07-20T12:00:00Z<p><em>Update July 20, 2015: The winning solution by Alexander Monakov also uses a
sorting network but transposes the items to be sorted to sort 32 nibbles in
parallel with a length 60 network, instead of my 4 nibbles with a depth 9
network. <a href="http://www.hanshq.net/nibble_sort.html#winning-simd">Hans Wennborg</a>
has a nice write-up of that solution.</em></p>
<p>Professor John Regehr at University of Utah held a small <a href="http://blog.regehr.org/archives/1213">programming contest
for “nibble sort”</a>. The goal is to sort
nibbles in a 64-bit value, 1024 times, as fast as possible. For example, the
nibble sort of <code>0xbadbeef</code> is <code>0xfeedbba000000000</code>.</p>
<h3 id="algorithm">Algorithm</h3>
<p>I chose to implement the sort using a <a href="https://en.wikipedia.org/wiki/Sorting_network">sorting
network</a>. I used the following
minimum-depth network to sort 16 items, which was designed by David C. Van
Voorhis.</p>
<div style="text-align:center">
<img src="/img/nibble-sort.png" />
<p>Figure 1. From <i>“The Art of Computer Programming, Volume 3”.</i></p>
</div>
<p>The inputs are distributed in rows on the left, and each vertical line segment
compares the two numbers on the parallel lines, swapping them if the upper
element is less than the lower element.</p>
<h3 id="parallelization">Parallelization</h3>
<p>For each of the 9 stages, the elements to be sorted are split into two buckets,
and the same indices in each bucket are compared and potentially swapped at the
same time using SIMD instructions. The split for each stage is a different
permutation depending on which elements are to be compared. This process was
inspired by the paper <em>“Efficient implementation of sorting on multi-core SIMD
CPU architecture”</em>. After each stage, the buckets are then again combined into
a single list using the inverse permutation. The permutations of combining the
buckets from the last stage and splitting them again for the next stage can be
reduced to a single permutation.</p>
<div style="text-align:center">
<pre style="text-align:left;display:inline-block">
<b>function</b> sort(input):
b1:b2 <- input
<b>for</b> i := 1 <b>to</b> 9:
b1:b2 <- each_min(b1,b2):each_max(b1,b2)
b1:b2 <- permute(step=i,b1:b2)
<b>return</b> b1:b2
</pre>
<p>Algorithm 1. Parallel network sort.</p>
</div>
<h3 id="implementation">Implementation</h3>
<p>On IA-32, the smallest unit that can be processed is a byte. Every 2 of the 16
nibbles in the input word are unpacked into the lower nibble of 2 bytes for a
total of 16 bytes to be sorted, and each bucket is 8 bytes. AVX2 can process 32
bytes or 4 buckets in parallel. <a href="https://github.com/regehr/nibble-sort/blob/master/beekman2.c">This
implementation</a>
runs more than 72× faster than the <a href="https://github.com/regehr/nibble-sort/blob/master/ref.c">reference
implementation</a> on my
test machine.</p>
<h3 id="discussion">Discussion</h3>
<p>Currently, the min/max operation takes ½ operation per word, while the permute
operation takes 1½ operations per word. The reason the permutation requires so
many instructions is that both buckets need to be in the same register for the
permute operation but they need to be in seperate registers for the min/max
operation. By carefully considering the shuffling constants, it’s possible to
do permutations #3 and #8 in 1 operation per word and ½ operation per word,
respectively.</p>
<p>The application can be further sped up by using multiple threads to do the
sorting, each of the 1024 elements can be sorted individually. When working
with much larger datasets, subsets of it can still be sorted individually, so
this algorithm scales very well.</p>
<p>
<em>Update July 20, 2015: The winning solution by Alexander Monakov also uses a
sorting network but transposes the items to be sorted to sort 32 nibbles in
parallel with a length 60 network, instead of my 4 nibbles with a depth 9
network. <a href="http://www.hanshq.net/nibble_sort.html#winning-simd">Hans Wennborg</a>
has a nice write-up of that solution.</em>
</p>