Business

The Exo Mastering Engine And Why You’ll Want To Know About It

Mastering is one of those terms many people have heard of but few really know what's behind it. Here we explain what it is and why you should care about it. You'll also learn what special mastering features we've built into api.audio so that you can get the most out of the tool.

Sam
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Mastering is one of those things that most people know nothing about, and yet anybody who has access to media experiences mastering several times a day: Every single music track, movie or TV-show has been mastered. It’s one of the natural steps in audio production. And yes, it’s complicated but don’t worry, we’ve got you covered.


Mastering is something audio nerds can talk about for hours and sound engineers will protect their knowledge about it like a treasure. Given that at Aflorithmic our goal is to make audio scalable we had to give mastering some serious thought as well. And we had to automate it.


The Aflorithmic mastering engine (otherwise known as ExO) is the final component in our production chain. It’s responsible for joining together speech and sound, and provides that extra ‘shine’, ensuring your listeners have a sonorous listening experience.


We have been working hard over the last months to not only improve the speed at which we can serve high-fidelity audio, but adding some amazing cutting edge features. This article is going to explore and detail these. But first…..


What the heck is mastering?

Think of mastering as the instagram filter of audio. It takes raw and unprocessed audio data and seamlessly mixes it together, while applying studio grade effects to ensure that your audio sounds as good as it can do. It also prepares it into a consumer grade format, i.e. so it can be easily played by your customers.

Why does this matter?

We are passionate audio enthusiasts here at Aflorithmic, and so we want to listen to high quality audio. Believe me, you will want that too. There is a reason why musicians and record labels invest months of time and a ton of money in good mastering. Sometimes to the extent that they’ll do it over and over again - if you’ve ever listened to music that is 20 years or older on Spotify or anywhere else, you’ll likely have noticed “Remastered” Albums. All that means is that a team of sound engineers sat down and ‘updated’ the sound of an album or song so it sounds fuller and louder. This has a lot to do with how we consume audio these days: on mobile devices with small speakers or headphones.


We want anyone who uses our API to experience the best possible audio we can produce and as a customer of ours, we want your customers to enjoy this also.

Audio file formats - What are they and why do they matter?

Not all audio is consumed in the same way. You’ll want some options depending on where you want to play your audio and how fast it needs to be served.


The ubiquitous mp3 is the standard for most people. However what most people perhaps don’t know, is that the mp3 format is actually a range of formats. From highly compressed (i.e. tiny files), to more high fidelity variants. We realize that not all customers need the same mp3 so we carefully designed 5 audio file quality settings for you, and these are:


mp3_very_low: A tiny format ideal for simple mastered speech supplied as a mono file. Use it when speed is your only concern and quality comes second

mp3_low: Mostly the same as above but in stereo

mp3_medium: This is default and a good balance between quality and file size

mp3_high: The best quality in most situations. This makes sense when you want high quality for storytelling formats and your listeners will use high quality speakers, such as studio headphones, smart speakers or home Hifi setups

mp3_very_high: If you really want the best audio quality, and file storage size is not a concern for you


We also realize that not everyone wants an mp3, so we also support other file formats:


wav: Uncompressed audio that is perfect for further processing.

Flac: A slightly unorthodox choice, that provides smaller file sizes without sacrificing audio fidelity

Ogg: A format similar to mp3 but without licensing issues. 


When making a mastering call you can supply endFormat, which will change the type of file the mastering engine will produce for you.


response = apiaudio.Mastering.create(

    scriptId="id-1234",

    endFormat=[”mp3”]

)



Force length

Given how dynamic our audio services are, it can be a challenge to know exactly how long your audio creations will be. A variety of factors can affect this, for example break tags, speaker speed, language and of course the length of your script.


Let’s say you want to create an audio ad and it needs to be exactly 30 seconds long, then it should actually be exactly 30,000 milliseconds long. This is why we have created two awesome features to solve this problem.


The first of these and the simpler option is to use forceLength. This parameter simply takes a value in seconds, and ensures that your creation is exactly this long down to the millisecond. If there is not enough speech to fill this time, then any supplied sound template will continue to loop (sound is often more desirable than silence). If, on the other hand, there is too much speech a warning will get returned (along with your audio) from your API request.


If you need more control and precision over your creations, you may need a section to end at a very specific time, or for a transition in your accompanying video advert to line up exactly with one of our awesome audio effects. Fear not - sectionProperties has you covered. This parameter allows you to specify what time a given section will end at. It works in a similar way to forceLength


However, to give you more control over how any gaps are filled, we (cough cough) borrowed an idea from the world of CSS and created audio justification. This allows your audio sources to move around with a section, producing a much more natural sounding audio asset. These are flex-left, flex-right, center. Imagine aligning your text in Microsoft Word: You can have it left-align, centered or right-aligned. Here is how this looks like in api.audio:

text = """

<<soundSegment::intro>><<sectionName::first>> Hello world and welcome to API dot audio. 

<<soundSegment::main>><<sectionName::second>> Lets force the section length.

"""



sectionProperties = {

'first': {'endAt': 5, 'justify': 'flex-start'}, 

'second': {'endAt': 10, 'justify': 'centre'}, 

}



script = apiaudio.Script().create(

    scriptText=text, scriptName="welcome"

)



speech = apiaudio.Speech().create(

    scriptId=script.get("scriptId"),

    voice="linda", 

)



mastering = apiaudio.Mastering().create(

    scriptId=script.get("scriptId"),

    sectionProperties=sectionProperties

)


Hybrid voices

Another service we provide for you is the ability to clone your voice and then have our infrastructure produce output from this through the API. Although not the focus of this blog post, several customers use a hybrid voice approach. Simply put, this combines the output of a synthetic text-to-speech process and a real recording of the same speaker to create an audio asset. We reengineered the mastering engine to analyse both sources and dynamically create a set of clever audio processes that ensures that the transition between these two mediums sounds seamless and natural. We have prepared a demo of this below using one of our pre-cloned voices. If you wish to know more about voice cloning and hybrid voices - please do get in touch :)

About Aflorithmic

Aflorithmic Labs, Ltd is a London/Barcelona-based technology company. The api.audio platform enables fully automated, scalable audio production by using synthetic media, voice cloning, and audio mastering, to then deliver it on any device, such as websites, mobile apps, or smart speakers.


With this Audio-As-A-Service, anybody can create beautiful sounding audio, starting from a simple text to including music and complex audio engineering without any previous experience required.


The team consists of highly skilled specialists in machine learning, software development, voice synthesizing, AI research, audio engineering, and product development.