Back to homepage
#blockchain#cryptography#security#ethereum#web3#seed-phrase#hacking

An Educational Insight into Seed Phrase Security

8 min read

This article explores the security aspects of cryptocurrency seed phrases, examining both their mathematical strength and potential vulnerabilities. Through a proof-of-concept experiment using OpenAI and web3.js, we'll demonstrate that while seed phrases are mathematically secure, they can be compromised if leaked online, as evidenced by finding two active Ethereum accounts with small balances within 24 hours of testing.

An Educational Insight into Seed Phrase Security

This article is an exploration of the security surrounding seed phrases, commonly used in the crypto community. Its primary aim is educational, shedding light on the strengths and potential vulnerabilities of seed phrases.

While we delve into the creation of a script that generates and tests these phrases using web3.js, I emphasize the moral and legal importance of not exploiting these findings maliciously.

This is not a guidebook for exploitation.

Let's dive in.

1. What is a Seed Phrase? A Deep Dive

A seed phrase, sometimes known as a mnemonic phrase, stands as a cornerstone of cryptographic security within the blockchain domain.

But what is it really, and how does it work?

1.1 The Makeup of a Seed Phrase

At its core, a seed phrase consists of 12 distinct words.

These aren't just any words picked from a dictionary; they are chosen from a predetermined list specially curated for this purpose.

This list (bip-0039) is composed of 2048 individual words, each chosen for its distinctiveness to prevent potential confusion.

1.2 The Mathematics of Security

The sheer math behind the seed phrase's security is staggering.

If one were to blindly guess a single word from the list of 2048, the odds stand at a mere 1 in 2048.

Now, if you amplify that probability by attempting to guess all 12 words in order, the numbers become astronomical.

Let's break this down a bit. For the first word, you have a 1 in 2048 chance of guessing correctly. For the second word, you have another 1 in 2048 chance, and so on for all 12 words.

If you multiply these probabilities together, the odds of getting all 12 words correct is a mind-boggling 1 in 2048 to the power of 12.

But that's not the end of it. Even if someone were miraculously to guess the 12 words, they would also need to arrange them in the correct sequence.

Given that there are 12! (12 factorial) ways to arrange 12 words, this adds yet another layer of complexity and security.

1.3 Why It's Deemed Secure

Given the probabilities discussed above, it's evident why the crypto community holds the seed phrase in high regard.

It's not just about the words themselves but the combination and order that make them so resilient against brute-force attacks. It's like having a combination lock where the numbers range in the thousands and the sequence spans a dozen places.

The mechanism appears foolproof, with its security rooted deeply in mathematical principles.

2. Unraveling the Limitations of Seed Phrases

While seed phrases have long been celebrated for their robust security mechanisms, they are not without vulnerabilities.

As with most technological advancements, new challenges and threats emerge over time.

2.1 The Explosion of the Crypto User Base

The crypto landscape has witnessed an explosive growth in users over the years.

As more people are drawn to the allure of decentralized finance and the myriad opportunities blockchain offers, the number of active accounts in the crypto world multiplies.

Each of these users, more often than not, operates multiple accounts. Whether it's for diversifying assets, ensuring anonymity, or simply organizing their investments, the result is an increase in the number of seed phrases generated and in active use.

In mathematical terms, while the odds of guessing a specific seed phrase remain astronomically low, the probability of exploiting other vulnerabilities (notably social engineering) or taking advantage of a greater number of poorly protected seed phrases increases.

2.2 Balancing Security with Practicality

The security mechanisms behind seed phrases are rooted in balancing user convenience and protection.

Too complex, and users might struggle with usability or even risk forgetting their phrases.

Too simple, and the system becomes more susceptible to breaches.

While seed phrases strike a commendable balance, the burgeoning crypto user base combined with fast-paced technological advancements necessitate a continuous evaluation of their long-term viability.

3. The Proof of Concept (POC): Putting Theory into Practice

Given the aforementioned challenges and the exponential growth of the crypto community, I was curious: how plausible is it really to stumble upon an active seed phrase?

To answer this, I embarked on a hands-on experiment, bridging advanced technology with the fundamental principles of seed phrases.

Recognizing the near-impossibility, mathematically, of randomly generating a valid seed phrase using a simple JavaScript script, I turned to a more unconventional approach: harnessing the power of OpenAI.

The goal of the POC is to determine whether the use of a large language model (LLM) can exploit the fact that there are likely many seed phrases that have already leaked on the web and may have been collected by the model on which OpenAI is trained.

3.1 The Setup: Leveraging OpenAI's Extensive Reach

Given OpenAI's advanced capabilities in scraping and processing vast amounts of online information, I was intrigued to explore a hypothesis:

Could OpenAI, in its extensive data trawling, have potentially come across seed phrases that might have been accidentally leaked online?

A cron job was established to facilitate this exploration.

This automated task was designed to prompt OpenAI to sift through its expansive dataset and generate potential seed phrases, extracted from the standard crypto list (BIP-0039), at regular intervals.

3.2 The Testing Phase: Scrutinizing for Validity

With a set of phrases generated by OpenAI, the next phase focused on testing these phrases for any valid matches.

Employing web3.js for Verification

For the verification process, web3.js was utilized.

This suite of libraries, essential for interacting with Ethereum nodes, was instrumental in checking whether any of the generated seed phrases corresponded to active Ethereum accounts.

Observing Account Balances

In instances where a match was found, the focus was to discreetly verify the account balance.

The intent here was purely investigative — to ascertain if these phrases, possibly retrieved from online leaks, were linked to active accounts.

Setting Up Automated Notifications

An automated notification system was integral to the experiment.

It was configured to alert me via email whenever a matching account with a balance was identified. This system enabled continuous monitoring without the need for manual intervention.

3.3 Continual Execution and Ethical Considerations

The process, from generating potential seed phrases to verifying account matches, was automated to occur at ten-minute intervals.

This relentless cycle was key to probing the extensive data resources of OpenAI and testing the hypothesis at hand.

It's crucial to emphasize that this experiment was conducted with the utmost respect for ethical standards. The objective was not to exploit or access any funds but to evaluate the security implications in a controlled, ethical manner.

This exploration into the intersection of AI and crypto security serves as a reminder of the intricate challenges and responsibilities in safeguarding digital assets.

⚠️ This POC was not aimed at breaching security or capitalizing on vulnerabilities. It was an academic exercise, a deep dive into the probabilities and potential risks that loom in the ever-expanding realm of crypto. I want to emphasize that, at no point did I access or exploit any account found during the research. My intentions remained purely investigative and educational.

4. The Findings: Surprising Results in a Short Time

Upon launching the experiment, the results came in faster than anticipated.

Here's a deep dive into the discoveries made during the brief span of the experiment and the implications they hold.

4.1 Immediate Results

The automated system was designed to churn out and test a vast array of seed phrases continuously.

Within the first 24 hours, to my astonishment, the cron job had successfully matched two distinct seed phrases to active Ethereum accounts.

What was particularly intriguing about these accounts was that they held a small amount of Ether — mere cents, suggesting that these accounts, while active, were not in regular use or had been previously compromised.

eth balance

This led me to a compelling hypothesis: the seed phrases for these two accounts had likely been leaked online at some point.

It's plausible that opportunistic individuals had already capitalized on these leaks, withdrawing the substantial funds and leaving behind only traces of Ether.

What makes this scenario particularly interesting is the role of OpenAI's training mechanisms. As OpenAI's models continuously learn and adapt from a vast range of internet data, they could have incidentally picked up on these leaked seed phrases.

These phrases, having been made public through their leakage, became part of the data pool that OpenAI uses for its learning processes.

This occurrence, while fascinating, also underscores a significant concern: it demonstrates how public data leaks can have lingering effects, especially when advanced AI tools like OpenAI are involved, capable of unearthing and repurposing such data.

4.2 The Larger Implication

While the experiment unearthed only a couple of accounts with minuscule amounts, it's worth pondering the bigger picture.

In a vast community of crypto users, where seed phrases are generated by the millions, even the tiniest probability of matching a phrase can lead to significant consequences.

It underscores the need for continuous evolution in security practices, ensuring users are always a step ahead of potential vulnerabilities.

Conclusion: A Measured Perspective on Seed Phrase Security

5.1 Evaluating the Risks

The odds of a random guesswork leading to a successful match of your seed phrase are indeed low.

However, the word ‘low' shouldn't be mistaken for ‘impossible'.

The experiment demonstrated that while the odds are slim, they aren't zero.

In a conventional sense, you might feel more threatened by tangible risks, like a home burglary, which, statistically, might be more likely.

5.2 The Prudent Approach to Crypto

Given the nascent stage of the cryptocurrency world and the rapid technological advancements, vigilance is key.

The findings advocate for a diversified approach.

Just as one wouldn't store all their life savings in a single place in the physical world, the same prudence should extend to the digital domain.

Spreading your crypto assets across multiple wallets can act as a safety net. Should one wall be breached, it doesn't lead to a total wipeout.

5.3 A Final Word of Caution

Reflecting on the essence of this research, the primary motive was always educational.

It was an exercise in understanding potential vulnerabilities, not a guidebook for exploitation.

Anyone considering the use of such knowledge for ill intentions is urged to rethink. The ethical and legal ramifications aside, the essence of the crypto community lies in its shared ethos of trust and collective growth.

5.4 Safekeeping Your Digital Treasure

In the rapidly evolving world of blockchain and crypto, vigilance, knowledge, and continuous learning are your best allies. Adopt best practices, stay updated, diversify your holdings, and approach every transaction or decision with a measure of caution.

Safety first is always a good motto.

I welcome your feedback on this piece. The decision to publish was one I grappled with, concerned it might inadvertently guide those with malicious intentions. Whether it stays online is still under consideration.

Always prioritize safety, and remember the wisdom of diversifying — never keep all your assets in a single place.