Mastering is one of those things that most people know nothing about, and yet anybody who has access to media experiences mastering several times a day: Every single music track, movie, or TV-show has been mastered. It’s one of the natural steps in audio production. And yes, it’s complicated but don’t worry, we’ve got you covered.
Mastering is something that audio nerds can talk about for hours and sound engineers will protect their knowledge about like a treasure. Given that at Aflorithmic our goal is to make audio scalable, we had to give mastering some serious thought as well. And by that, we mean we had to automate it.
The Aflorithmic mastering engine (otherwise known as ExO) is the final component in our production chain. It’s responsible for joining together speech and sound, and provides that extra ‘shine’, ensuring your listeners have a sonorous listening experience.
We have been working hard over the last months not only to improve the speed at which we can serve high-fidelity audio, but also by adding some amazing cutting-edge features which this article is going to explore and detail. But first…
What the heck is mastering?
Think of mastering as the instagram filter of audio. It takes raw and unprocessed audio data and seamlessly mixes it together, while applying studio-grade effects, to ensure that your audio sounds as good as it possibly can. It also prepares it into a customer-grade formats, i.e. so it can be easily played by your audience.
Why does this matter?
We are passionate audio-enthusiasts here at Aflorithmic, so naturally we only want to listen to high-quality audio. And believe me, you will want that too. There is a reason why musicians and record labels invest months of time, and a ton of money, on good mastering. Sometimes to the extent that they’ll do it over and over again - if you’ve ever listened to music that is 20 years or older on Spotify or anywhere else, you’ll likely have noticed “Remastered” Albums. All that means is that a team of sound engineers sat down and ‘updated’ the sound of an album or song so it sounds fuller and louder. This has a lot to do with how we consume audio these days: on mobile devices with small speakers or headphones.
We want anyone who uses our API.audio to experience the best possible audio we can produce and as a customer of ours, we want your customers to enjoy this also.
Audio file formats - What are they and why do they matter?
Not all audio is consumed in the same way. You’ll want some options depending on where you want to play your audio and how fast it needs to be served.
The ubiquitous mp3 is the standard for most people. However what most people perhaps don’t know, is that the mp3 format is actually a range of formats. From highly compressed (i.e. tiny files), to more high fidelity variants. We realize that not all customers need the same type of mp3, so we carefully designed 5 different audio file quality settings for you. These are:
mp3_very_low: A tiny format ideal for simple mastered speech supplied as a mono file. Use it when speed is your only concern, and quality comes second.
mp3_low: Almost the same as above but in stereo.
mp3_medium: This is the default setting, and a good balance between quality and file size.
mp3_high: The best quality in most situations. This makes sense when you want high quality audio. For example, for storytelling formats where your listeners will use high quality speakers, such as studio headphones, smart speakers or home Hifi setups.
mp3_very_high: If you really want the best audio quality, and file storage size is not a concern for you
mp3_alexa: This is a file format compatible with Alexa technology.
We also realize that not everyone wants an mp3, so we also support other file formats too:
wav: Uncompressed audio that is perfect for further processing.
Flac: A slightly unorthodox choice, that provides smaller file sizes without sacrificing audio fidelity.
Ogg: A format similar to mp3 but without licensing issues.
When making a mastering call you can supply endFormat, which will change the type of file the mastering engine will produce for you.
Time and Length Alignment
Given how dynamic our audio services are, it can be a challenge to know exactly how long your audio creations will be. A variety of factors can affect this, for example break tags, speaker speed, language, and of course the length of your script.
Let’s say you want to create an audio ad and it needs to be exactly 30 seconds long, then it should actually be exactly 30,000 milliseconds long. This is why we have created two awesome features to solve this problem.
The first of these and the simpler option is to use forceLength. This parameter simply takes a value in seconds, and ensures that your creation is exactly this long down to the millisecond. If there is not enough speech to fill this time, then any supplied sound template will continue to loop (sound is often more desirable than silence). If, on the other hand, there is too much speech a warning will get returned (along with your audio) from your API request.
If you need more control and precision over your creations, you may need a section to end at a very specific time, or for a transition in your accompanying video advert to line up exactly with one of our awesome audio effects. Fear not - sectionProperties has you covered. This parameter allows you to specify what time a given section will end at. It works in a similar way to forceLength.
However, to give you more control over how any gaps are filled, we (cough cough) borrowed an idea from the world of CSS and created audio justification. This allows your audio sources to move around with a section, producing a much more natural sounding audio asset. These are flex-left, flex-right, and center. Just like aligning your text in Microsoft Word: You can have it left-align, centered, or right-aligned. This is how it looks in api.audio:
Another service we provide for you is the ability to clone your voice and then have our infrastructure produce output from this through the API. Although not the focus of this blog post, several customers use a hybrid voice approach. Simply put, this combines the output of a synthetic Text-to-Speech process and a real recording of the same speaker to create an audio asset. We re-engineered the mastering engine to analyse both sources and dynamically create a set of clever audio processes that ensure that the transition between these two mediums sounds seamless and natural. If you wish to know more about voice cloning and hybrid voices don't hesitate to get in touch.
Aflorithmic Labs, Ltd is a London/Barcelona-based technology company. The api.audio platform enables fully automated, scalable audio production by using synthetic media, voice cloning, and audio mastering, to then deliver it on any device, such as websites, mobile apps, or smart speakers.
With this Audio-As-A-Service, anybody can create beautiful sounding audio, starting from a simple text to including music and complex audio engineering without any previous experience required.
The team consists of highly skilled specialists in machine learning, software development, voice synthesizing, AI research, audio engineering, and product development.