Kindling the Spark of Inspiration

Oct 28, 2023



At Spellbrush, one of the biggest problems we face as an AI tools company is human creativity. For all that people marvel about text-to-image technology, it still requires text. When confronted with a sophisticated text box that can show them Any Image in the World(tm), 50% of humans leave on the spot. The reason is simple: they don’t know what to type.

No matter the medium, the most fearsome enemy of creativity is the blank canvas.

This struggle is not a modern problem: since antiquity, humans have devised systems for stimulating creativity and ideation.

In this devlog, we’ll talk about imagination tools, and the unique way we’ve designed the niji mobile app to spark imagination.

Cleromancy: World’s oldest random numbers

One of the oldest known instances of such a system is the I Ching, a collection of hexagrams invented in the 1000 BCE.

The diviner draws stalks of yarrow, and interprets the stalks drawn based on a chart of 64 characters, which in turn is used to interpret divine intent.

I Ching hexagrams, from wikipedia

Now here, you may be wondering what divination has to do with inspiration. In fact, they are very similar. Consider this: the feared demons and dragons of olden days are the fantastical creatures we think of as imaginary today! And so we study systems like I Ching, because “belief” is the precursor of “imagination”

Some would call this the world’s first random number generator! If you think of our system as text-to-image, this is randomness-to-text! At the time, the I Ching not only provided an interesting source of ideas, it was revered as a direct connection into the doings of the divine and even informed state policies!

Generative AI: Meaning in Randomness

Fast forward to the generative AI age: humans remain fascinated in sparking their imaginations with randomness. In the first wave of generative AI technologies, our previous project, Waifu Labs is a randomness-to-image machine.

When you increment the numbers one at a time, you can get a smooth, blended effect like this:

(you can read more about Waifu Labs here ⬆️ Arrowmancer: A History of Generative Art)

When diffusion technologies matured, we replaced the front part of the equation: We changed from randomness-to-image to text-to-image. Instead of sifting through random numbers to find our idea, we can now directly prompt for it:

Girl with blonde twintails, blue eyes, black armor, roses

In the prompt above, I didn’t specify what color the roses would be, and the AI made a suggestion. However, as many users who have tried to type in “heterochromia, two toned hair” may have realized, designing systems like this is a fine-grained tradeoff between control and serendipity: in other words, parsing user intent.

However, the paradox in this situation is this: Language is the most direct way to indicate intent. (Think of how many dramas where you sit there thinking “this wouldn’t happen if the characters just talk to each other”?)

If text is already the best format of control, how do you get better than text-to-image?

Social Media: Sophisticated Serendipity

To solve the problem of intent, here is a different angle to evaluate the imagination problem: In order for humans to form ideas, they must consume existing ideas. We desire our creative interface to suggest random ideas, but preferably ideas that the we care about instead of random snippets of intent from a celestial force, in a pile of yarrow stalks.

Surprisingly, the solution has existed before generative AI.

Social media algorithms don’t rely on you to type text to explain your intent. Instead, you give signals on various content, and it will adjust the content based on what you signal on.

It works surprisingly well, without any generation involved:

Twitter/Tumblr: Text-to-text

Pinterest: Image-to-image

TikTok: Video-to-video

You may have heard this X-to-Y social media paradigm referred to as “The algorithm.” By collecting and organizing content according to your preferences, social media suggests more organized randomness for you.

Today, it’s not just useful for casual surfing: social media is a staple of the modern creative industry. Collecting and categorizing our consumption is the way through which humans refine their ideas, and most feed algorithms today extend this behavior.

There is a great amount of sophistication (and money!) in the way that these apps help you sort through your thoughts.

Single item, negative signal

Apps like TikTok feature a single width feed. In this type of app, users are asked to process every item in sequential order. The question that this type of app asks is:

“Given this item, do you like or dislike it?” If a user skips past a video, it means that they don’t like it.

Multiple item, positive signal

In comparison, an app that shows multi-width feed, like xiaohongshu, changes the sorting action. In this type of app, the question being asked is

“Given this screen of items, pick the one that you like the best”

In this type of paradigm, a skip is not necessarily counted as a dislike: it’s just not as good as some of the other ones on the same page.

Multiple item, conditional signal

And then there are apps like pinterest, which asks you a different question entirely:

“Given this screen of options, which of these seem relevant AND where do they belong in your mental framework?”

It is an extension of xiaohongshu question, where you are asked to both pick things relevant to you, and find a category for it.

These are all different ways that we have tried to express human intent through AI! The key part of this interaction is that the user isn’t asked to make their own content, but rather, the user expresses intent by making decisions about existing content.

It is the difference between knowing how to draw and knowing what kind of pictures you like! Generally people can’t do the former, but they are very good at doing the latter!

Sorting existing ideas is a powerful spark to imagination!

Magpies of Ideas

During the making of the niji mobile app, we asked ourselves whether we could use a similar method to spark inspiration in a generative AI workflow. We’ve taken care of the “content generation” portion with the AI image model, but we wondered whether we could guide users closer to their intention through clever interaction design.

The center of the niji mobile interaction is the collection of ideas. Prompts are cut up into clickable tags, which users can incorporate into their own prompts.

Mixing piecewise ideas

The piecewise tagging system allows users to break up concepts and rearrange them in interesting ways into their own prompt.

Collecting ideas from the moment

Users can collect tags from the live stream, which shows a moment-to-moment heartbeat of what all the users are generating in the app

Collecting ideas from a collection

If they don’t want a firehose of new ideas, users can collect tags from specific other users.

It’s undoubtedly still text-to-image, but we’ve made the text much more ALIVE.

What we are crafting here is the concept that ideas are environmental. The tags are unobtrusive to the creation flow as much as markings on a building are unobtrusive to its function. But we cover every inch of the app with prompts that you can magpie away.

A certain videogame comes to mind, when I think of this interaction.

In Dark Souls, players could scribble messages to each other in the environment.

This text is especially important to new users, who may not have as much experience with how text-to-image systems work. A new user stepping into the niji mobile app won’t need to learn how to use it from a tutorial, they’ll learn by looking at others prompting all around them.

Beyond Text

Of course, text is only the beginning. Our research team is fascinated with novel forms of interpreting user intent.

What if we could predict your personal preferences by asking you to rate pictures? What if we could pose a picture using your phone camera? It’s an unparalleled time in the field of imagination research. Please look forward to the new control schemes we are working on, in both our discord and mobile offerings!

You can find niji・journey here:

As always, thank you for taking this journey with us!

Related Articles

niji・journey

What is niji・journey?

Welcome to niji・journey, a state-of-the-art AI that draws custom anime illustrations, just for you! A magical collaboration, designed together between brilliant minds at Spellbrush & Midjourney. Whether you’re looking for a cute chibi character or a dynamic action scene, niji・journey can bring your vision to life. We can’t wait to see what you create!

Download on the App StoreGet it on Google Play

Where can we find you?

If you’re an AI researcher and you love anime, please shoot us an email over at [email protected].

Otherwise, if you're talented and on the job market, you can find other open positions on our careers page.

For commercial inquiries and studio licensing, please contact [email protected].