Orian Sharoni
PhD Candidate at SCAI
When we’re researching how algorithms can do something in a human way, it allows us to ask: what is human?
Orian Sharoni is a PhD candidate at SCAI who is working on AI generation of live music. In our interview, she reflects on the challenges and rewards of transitioning to academia after working in the private sector, the most stimulating aspects of her current research, and the complex, necessary marriage of science and art.
Hi, Orian. Tell me about your academic background. How did it lead you to this international PhD fellowship with SCAI?
I have a master's from Tel Aviv University where I studied a computational branch of cognition that is focused on modeling behavior. After my master's, I began working in the startup scene in Tel Aviv and a bit in the US.
I worked as a machine-learning researcher for several years and had my own consultancy company before I realized I missed academia enough to return. I began to look for a PhD program that I would want to give my energy to, and here I am!
What is your PhD topic?
My topic is called “Collective listening and learning agents for generative improvisation.”
To break this down, I am researching and building algorithms that will be able to play music with other algorithms or with humans in a live setting. This falls under the branch of generative AI, sometimes called Gen AI, which is a family of algorithms that can synthesize data. In my case, this data is music.
A very famous generative AI tool that we all know by now is Chat GPT.
Indeed, with the popularity of Chat GPT, almost everyone has a certain familiarity with your field of work now. How does it feel to be conducting such topical research?
It definitely feels like a very special and unique time to be researching in this field, because it's not often that someone gets to do something with so much hype around it. Usually when you're in academia, you work on something that takes a long time to explain and legitimize: Is it important? Why are you doing it? And sure, I'm focused on a very specific angle in this field of development, but I remember the days of my master’s where it took me a long time to explain and get people to understand.
Where does your interest in generating music come from?
Before I did my master's, I worked at a radio station for three years, and I also just love playing music for the fun of it. As part of my bachelor’s degree, my major was in musicology. Music has always been of interest to me.
For my PhD program, having done a lot of speech synthesis and speech-related machine learning research in the industry, and having this musical background, I thought it was a great combination to bring to machine learning and big, neural networks that play music with people. Music is definitely a vast domain and I want to be able to communicate with musicians and speak in a language we all can understand.
One of my PhD advisors actually made a point of saying that I need to keep on playing music for fun alongside my research, because I need to stay connected to what it feels like to play music. So alongside writing code every day, music is also a part of my daily life.
What made you choose to come to Sorbonne University?
I was certain that I wanted to do a PhD in an audio-related machine learning topic, ideally in the generative AI realm, so I spent time looking into the best programs and the best researchers I could find to come and do this with. When I saw SCAI announce that they were opening the first cohort of Sound.AI, it just sounded like an amazing program, and I am really happy to have made it and to be a part of it.
I am part of the research center Ircam, and I believe they’re one out of a handful of places that see the benefit of collaboration between art and science in a very meaningful way. That was a big part of my incentive to come here: to conduct the best machine learning and algorithmic research in an environment that cherishes and really appreciates how science can contribute to art and also how art can develop with science.
What have you most enjoyed about being in Paris?
In Paris, I really notice the value of face-to-face and human encounters. It’s the kind of city where you’re encouraged to be out of the house and mingling. Science, in essence, is about telling a story. For me, it's done through writing code, experimenting with different algorithms, so I get a lot out of sitting with colleagues, sharing struggles, asking questions. And even just to have the opportunity to narrate and structure your research as a story. For me it's very meaningful to be able to do that.
I also started a journal club within Ircam, so that my colleagues and I can keep up with scientific literature and share our perspectives on it.
How has it been returning to academia after working in the industry?
Well, in the industry, when you're researching for private companies, it's very secretive. How they build their algorithms is a trade secret and you can’t share your work. Personally, I think there is a lot more value with the opportunity I have now to talk about what I'm doing in detail and to hear about what other people are doing in detail.
What is the biggest challenge in your current research?
I'm looking into live music generation in scenarios where humans are playing together, and that’s quite an intricate process. When someone plays music, they're listening to what they're doing, they’re thinking about the next thing they’re going to play, and at the same time, they’re listening to the music that’s coming from other people in the room. I don’t think we realize, as humans, how miraculous this is. We’re great at synchronizing and reading cues, even when we just talk. It’s amazing!
Creating algorithms that can generate this kind of communication with music, in a live setting, is definitely challenging. I'm focused on synchronization and beat, on being able to listen and to notice and be attentive to another person. That's why a portion of my research is not generating music. It's machine listening, which is investigating what it is that we’re doing when we choose whether we're leading or following someone's movement or beat. This is where it gets quite philosophical, because there are many theories about cognitive capability, and there are many answers as to how humans operate that are still being researched. When we’re researching how algorithms can do something in a human way, it allows us to ask: what is human? In my research, when does it sound human enough? It raises questions about human perception from different directions, and I find that so fascinating.
What impact on society do you foresee for your work?
I think that music has a tremendous effect on us on many different levels, and not just from the perspective of researching AI, but also culturally. How do advances in music change us culturally? At Ircam, the perspective is focused on enhancing the creativity of musicians; introducing new, innovative tools to facilitate their creations.
And this perspective specifically is important because oftentimes when people think of AI, they think about replacing humans, and this is not at all the agenda that we operate by. It's absolutely the opposite. No one is disputing the idea that the most fascinating creation remains human-made. Moreover, when I want to read a story or listen to a piece of music, I want to receive this work that another human being made. I would love to give them more tools to be researching with, and this is where my contribution comes in, it’s to enhance their creative process and potential, not replace it.