Working with Azure OpenAI on Semantic Kernel

Introduction

The Semantic kernel is an abstraction layer that helps connect regular applications with advanced AI services.

It’s a free SDK for developers that makes it easy to use AI models from different places, like OpenAI, Azure, or other AI services, in their existing code with the help of connectors. Connectors are modules providing integrations for popular services

Getting familiar with OpenAI in Semantic Kernel

Semantic kernel comes with first-class support for Azure AI and OpenAI services. Under the hood, it depends on the Azure OpenAI SDK for robust implementation.

Since the Azure OpenAI and OpenAI services are implemented using common abstractions, switching from an OpenAI service to an Azure OpenAI service is just a 1 line change.

// Create kernel
var builder = Kernel.CreateBuilder();
// Azure OpenAI service
// builder.Services.AddAzureOpenAIChatCompletion()
// OpenAI service
// builder.Services.AddOpenAIChatCompletion()
builder.WithCompletionService();
var kernel = builder.Build();

Semantic kernel currently supports various available OpenAI models including:

Language models like gpt-3.5-turbo, gpt-4, gpt-4-turbo and gpt-4-turbo-vision.
Image generation models like Dall-e-3 and Dall-e-2.
Text embedding models like ada-002.

Support for speech-to-text models like whisper is currently in progress and will be soon available for use.

While the OpenAI API keeps on adding various new features with its periodic updates, the semantic kernel SDK tries to integrate them as soon as possible while keeping it simple.

We can create and maintain a chat history for our AI service based on user roles using the ChatHistory class.

    ChatHistory chatHistory = new ChatHistory()
    {
	    new(AuthorRole.System,"You are a helpful chatbot"),
        new(AuthorRole.Assistant,"Hello, How can I help you"),
        new(AuthorRole.User,"Hello")
	};

OpenAI execution settings

Setting up execution parameters for an API request in semantic kernel is fairly straightforward. Semantic kernel currently supports multiple execution parameters provided in the OpenAI API in V1 with some newly introduced like seed and responseFormat are currently in the experimental stage.

OpenAIPromptExecutionSettings executionSettings = new OpenAIPromptExecutionSettings()
    {
        MaxTokens = 100,
        Temperature = 0.7,
        FrequencyPenalty = 0.8,
        ResultsPerPrompt = 3,
        TopP = 0.9,
        StopSequences = "<end>",
        ChatSystemPrompt = "You are an AI assistant",
        ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
        Seed = 1821212, // currently experimental
        ResponseFormat = "json_object",  // currently experimental
    };

MaxTokens: This determines the maximum number of tokens (words) the model should generate in response to a prompt. Setting it to 100 will generate a response with a maximum length of 100 tokens. This is an absolute value and will cut off responses mid-sentence so it should be set according to the expected response size.
Temperature: This controls the randomness of the model’s output. A higher temperature (1.0) makes the output more random and creative, while a lower temperature (0.7) makes the output more focused and deterministic.
FrequencyPenalty: This penalizes the model for frequently repeating similar output. A higher value (0.8) will discourage the model from repeating itself too often in the generated responses.
ResultsPerPrompt: Specifies the number of alternative completions or responses the model should generate for a single prompt. If set to 3 three different responses will be generated for the same request.
TopP: Also known as nucleus sampling, this parameter influences the diversity of the model’s output. It sets a threshold for the cumulative probability of the model’s vocabulary, with higher values (0.9) including more diverse word choices.
StopSequences: This allows the specification of sequences that, when generated, will signal the model to stop generating further content. It can be anything based on your use case like a period . or a keyword STOPGEN or a specific fine-tuned token indicating the end of response.
ChatSystemPrompt: This is the initial prompt or context provided to the model. If no system prompt is provided either in OpenAIPromptExecutionSettings or the Chathistory it defaults to “Assistant is a large language model.”
ToolCallBehavior: Determines the behavior related to invoking kernel functions. If set to AutoInvokeKernelFunctions, the functions will be automatically invoked by the kernel. It also comes with many other options for more advanced control over function calling.
Seed: Currently An experimental parameter, the seed influences the randomness of the model’s output. If specified, the system tries to make results deterministic provided that repeated requests have the same seed and parameters. However, the deterministic results are not guaranteed.
ResponseFormat: Another experimental parameter, specifies the format in which the model’s response should be returned. Can be set to json_object for getting responses in JSON format only.

While the above options help us customize our request, some of the parameters are currently not supported by all models. For example, the gpt-vision model currently does not support ResultsPerPrompt and is limited to only one response per request.

Make sure to check the supported parameters for specific models in the latest OpenAI documentation before using them.

Here’s a simple example of sending a simple request with chatHistory and executionSetting.

var builder = Kernel.CreateBuilder();
// select an AI service
// builder.Services.AddAzureOpenAIChatCompletion()
// builder.Services.AddOpenAIChatCompletion()
var kernel = builder.Build();

//Set the prompt settings
OpenAIPromptExecutionSettings executionSettings = new()
{
	MaxTokens = 100,
	Temperature = 0.7,
	ResultsPerPrompt = 3,
};

// Set up chat history
ChatHistory chatHistory = new ChatHistory()
    {
        new(AuthorRole.System,"You are a helpful chatbot"),
        new(AuthorRole.Assistant,"Hello, How can I help you")
    };

// Add user message
string userMessage = "Hello";
chatHistory.AddUserMessage(userMessage);

// Get registered AI service from the kernel
var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

// Get AI service response
foreach (var chatMessageChoice in await chatCompletionService.GetChatMessageContentsAsync( chatHistory, executionSettings , kernel))
{
    Write(chatMessageChoice.Content ?? string.Empty);
    WriteLine("\n-------------\n");
}

Wrapping Up

The Semantic Kernel provides a powerful and flexible way for developers to integrate advanced AI services into their applications seamlessly. With built-in support for both Azure AI and OpenAI services, developers can easily switch between these platforms with minimal code changes.

Developers can leverage the Semantic Kernel to enhance their applications with advanced AI capabilities, providing a smoother and more efficient development experience.