RCE Endeavors 😅

May 18, 2021

Creating a multi-language compiler system: Containerization (7/11)

Filed under: Programming — admin @ 10:29 PM

Table of Contents:

This post will discuss how to containerize the multi-language compiler system. As far as functionality goes, it is not necessarily needed: the system as described up to this point is complete and will function as described. Containerization is just a nice to have since it provides a consistent environment for the compiler system to run it. Additionally, containerizing the system will allow for a more flexible architecture since each language can run in its own container. When combined with orchestration platforms like Kubernetes, the architecture can become even more powerful as these different containers can have replicas and autoscaling.

The system will be containerized via Docker; we will isolate each language into its own Dockerfile. These individual language Dockerfiles will extend a general purpose Dockerfile that will contain features common to all environments. You can think of this as a similar approach to the Bash scripts of the file watcher.

Top-level Dockerfile

This is the base Dockerfile that the various language-specific ones extend. This Dockerfile is responsible for:

  • Adding packages common to all images
  • Compiling the file watcher component code
  • Compiling the execution component code
  • Settings up the user and execution environments and directories

To provide a degree of isolation and some security, the execution component will run as a different, lower privileged, user than the file watcher. Read/write/execute privileges are lowered as well for the execution environment. This helps a bit from a security standpoint, although it is definitely not foolproof. The top-level Dockerfile is provided below:

FROM n0madic/alpine-gcc:9.2.0

RUN apk add --update --no-cache su-exec inotify-tools build-base busybox-suid sudo

# Setup user
ARG USER=user
ENV HOME=/home/${USER}
ENV EXEC=exec
ENV EXEC_HOME=/home/${EXEC}

ENV CODE_PATH=/home/${USER}/code
ENV EXEC_PATH=/home/${EXEC}/code

RUN mkdir ${HOME}
RUN mkdir ${CODE_PATH}

RUN mkdir ${EXEC_HOME}
RUN mkdir ${EXEC_PATH}

ADD agentshared ${CODE_PATH}

# Build file watcher code
RUN g++ -std=c++17 -o ${CODE_PATH}/agent -I ${CODE_PATH}/builtin/code/agent/thirdparty/cereal/include \
    -I ${CODE_PATH}/builtin/code/agent/thirdparty/thread_pools/include \ 
    ${CODE_PATH}/builtin/code/agent/src/*.cpp \ 
    ${CODE_PATH}/builtin/code/agent/src/Agent/Notify/*.cpp \ 
    -lstdc++fs -pthread

# Build executor code
RUN g++ -std=c++17 -o ${EXEC_PATH}/executor ${CODE_PATH}/builtin/code/executor/src/Source.cpp -lstdc++fs -pthread

RUN mv ${CODE_PATH}/builtin/scripts/startup.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/scripts/shutdown.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/config/config.json ${CODE_PATH}
RUN rm -rf ${CODE_PATH}/builtin

RUN adduser -D ${USER} && echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER && chmod 0440 /etc/sudoers.d/$USER
RUN adduser -D exec

RUN sudo passwd -d root
RUN sudo passwd -d ${USER}

RUN echo 'user ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers

RUN chown -R ${USER}:${USER} ${HOME}
RUN chmod -R 751 ${HOME}

RUN chown -R ${EXEC}:${EXEC} ${EXEC_HOME}
RUN chmod -R 555 ${EXEC_PATH}

The startup.sh and shutdown.sh scripts referenced in the Dockerfile as shell scripts that, as their name suggests, are invoked at startup and shutdown. The startup script is responsible for setting up the appropriate folders when a container is launched. Its content is shown below:

#!/bin/bash

CONTAINER_ID=$(basename $(cat /proc/1/cpuset))

create_folders () {
    mkdir ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}

fix_config () {
    sed -i "s|\${CODE_PATH}|${CODE_PATH}|g" ${CODE_PATH}/config.json
    sed -i "s|\${UNIQUE_ID}|${CONTAINER_ID}|g" ${CODE_PATH}/config.json
    sed -i "s|\${LANGUAGE}|${LANGUAGE}|g" ${CODE_PATH}/config.json
    sed -i "s|\${SUPPORTED_LANGUAGES}|\"${SUPPORTED_LANGUAGES}\"|g" ${CODE_PATH}/config.json
    sed -i "s|\${IS_MULTITHREADED}|${IS_MULTITHREADED}|g" ${CODE_PATH}/config.json
}

start_agent () {
    ./agent config.json
}

launch_dotnet () {
    dotnet run
    sudo su-exec exec dotnet run
}

main () {
    launch_dotnet
    create_folders
    fix_config
    start_agent
}

main

Since multiple containers can be launched, each container needs its own isolated container environment. This is handled by the create_folders function. The fix_configs function is responsible for setting up the configuration that the file watcher will use. Once the appropriate folders have been created and the configuration substitutions made, the file watcher can be launched via the start_agent function.

The opposite of this process happens on a container shutdown. The created folders are deleted, any unprocessed input is relocated, and the file watcher is shut down. The code for this is shown below:

#!/bin/bash

CONTAINER_ID=$(basename $(cat /proc/1/cpuset))

kill_agent () {
    killall -9 agent
}

relocate_input () {
    mkdir ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}
    mv ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}/
    mv ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}/
}

delete_folders () {
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}

main () {
    relocate_input
    delete_folders
    kill_agent
}

main

Language-specific Dockerfiles

The language-specific Dockerfiles are much smaller since they contain only the additional functionality needed for a particular language environment. This language-specific functionality is usually just packages or runtimes that are needed for the compiler/interpreter to run. An example for the Java Dockerfile is shown below:

FROM compiler-base-alpine:latest

# Install Java
RUN apk add --update --no-cache openjdk11 --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community

# Setup Java environment
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk
RUN export JAVA_HOME

ARG USER=user
ENV CODE_PATH=/home/${USER}/code

# Setup language(s)
ENV LANGUAGE=java
ENV SUPPORTED_LANGUAGES=java
ENV IS_MULTITHREADED=true

# Setup PATH
ENV PATH=/usr/lib/jvm/java-11-openjdk/bin:${PATH}

USER ${USER}
WORKDIR ${CODE_PATH}

CMD ["./startup.sh"]

This Dockerfile extends the base one by installing OpenJDK and setting up the environment variables and path that are needed for Java to be invoked from the command line. Dockerfiles for other languages follow a similar pattern. The script to build these Dockerfiles is provided below. This script builds the Dockerfile for each supported language, tags it as latest, and pushes it to the local Docker repository.

#!/bin/bash

languages="alpine c cpp cs java py"

for language in $languages; do
    echo "Building image for ${language}"
    sudo docker build -t compiler-base-${language} -f Dockerfiles/${language}/Dockerfile .
    echo "Tagging image for ${language}"
    sudo docker tag compiler-base-${language}:latest localhost:32000/compiler-base-${language}:latest
    echo "Pushing image for ${language}"
    sudo docker push localhost:32000/compiler-base-${language}:latest
done

At this point each language has been containerized and has its own defined environment for the compiler system to run it. The next post will cover how these Dockerfiles work in combination with Kubernetes to provide resiliency and scaling.

Creating a multi-language compiler system: Execution Engine (6/11)

Filed under: Programming — admin @ 10:29 PM

Table of Contents:

This post will cover the execution component of the multi-language compiler. If you’re still reading, then congratulations! This is the last component of the system and a not too complicated one at that. The execution component has two tasks: provide input (if necessary) and capture output. The approach taken here will be to connect a pseudoterminal to the executing process so that input/output can easily be captured by reading or writing to those streams.

The execution component splits these tasks up into individual threads:

  • Main thread: connect the pseudoterminal, execute the code, and wait for the timeout value to hit or for the process to exit
  • Reader thread: select on the pseudoterminal file descriptor and write the output to the output file.
  • Writer thread: monitor the stdin input file via inotify. On a change, read that change and write it to the stdin of the pseudoterminal file descriptor.

Main thread

The code to connect the pseudoterminal, launch the chilld process, and wait for its return is pretty straightforward. These are all done with provided system APIs, namely forkpty, execv, and waitpid. The snippet to accomplish this is shown below:

    ....
    int masterFileDescriptor = -1;
    pid_t childProcessId = forkpty(&masterFileDescriptor, NULL, NULL, NULL);
    int childReturnCode = -1;

    if (childProcessId == -1)
    {
        perror("forkpty");
        exit(-1);
    }

    if (childProcessId == 0)
    {
        int result = execv(options.BinaryToExecute.c_str(), options.BinaryArguments.data());
        exit(result);
    }
    else
    {
        std::thread(ListenForOutput, masterFileDescriptor, options.OutputFilePath)
            .detach();

        if (options.IsInteractive)
        {
            std::thread(ListenForInput, masterFileDescriptor, options.InputFilePath)
                .detach();
        }

        childReturnCode = WaitForCloseOrTimeout(childProcessId, options.MaxWaitTimeMs);
    }

    return childReturnCode;
}

What is shown in the code is exactly what was described above: the execution component creates a new process with a pseudoterminal attached. This process gets passed any command line arguments to it, and then the thread that listens for output is launched. If this is an interactive session (user can provide stdin input at runtime) then the thread that listens for input is launched as well. The process then runs and returns its return code to the execution component, which subsequently returns it to the script that invoked it.

The WaitForCloseOrTimeout function is just a wrapper around waitpid that polls the child exit code up to a maximum timeout value. If the timeout has been hit then the child process is killed and 124 is returned as the timeout exit code; otherwise if the process exits within the allotted time then its exit code is returned. The WaitForCloseOrTimeout function is shown below:


pid_t WaitForCloseOrTimeout(const pid_t childProcessId, const int maxWaitTimeMs)
{
    int childReturnCode = -1;

    constexpr int sleepTimeMicroseconds = 100000;
    int elapsedTimeMs = 0;
    bool timeoutExpired = false;
    bool childExited = false;
    while (!timeoutExpired && !childExited)
    {
        int result = waitpid(childProcessId, &childReturnCode, WNOHANG);
        if (result == -1)
        {
            perror("waitpid");
        }
        if (result == childProcessId)
        {
            childExited = true;
        }

        usleep(sleepTimeMicroseconds);
        elapsedTimeMs += sleepTimeMicroseconds / 1000;
        timeoutExpired = (elapsedTimeMs >= maxWaitTimeMs);
    }

    if (timeoutExpired)
    {
        constexpr int timeoutReturnCode = 124;
        childReturnCode = timeoutReturnCode;
        kill(-childProcessId, SIGTERM);
    }

    return childReturnCode;
}

Reader thread

The reader thread is as straightforward as can be: in a loop we read the output and write it to a file.

void ListenForOutput(const int masterFileDescriptor, const std::string outputFilePath)
{
    g_outputFile.rdbuf()->pubsetbuf(0, 0);
    g_outputFile.open(outputFilePath, std::ios::out);

    constexpr int BUFFERSIZE = 1024;
    std::array<char, BUFFERSIZE> buffer;
    fd_set fileDescriptors = { 0 };

    while (true)
    {
        FD_ZERO(&fileDescriptors);
        FD_SET(masterFileDescriptor, &fileDescriptors);

        if (select(masterFileDescriptor + 1, &fileDescriptors, NULL, NULL, NULL) > 0)
        {
            auto bytesRead = read(masterFileDescriptor, buffer.data(), buffer.size());
            if (bytesRead > 0)
            {
                g_outputFile.write(buffer.data(), bytesRead);
            }
        }
    }
}

Writer thread

The writer thread is a bit more complex. This thread needs to monitor the stdin file that contains the state of the interactive session. When writes are performed to this session file, the thread will need to read where the write occurred and write it to the stdin of the execution process. Since writes can happen multiple times to the session file, the last written offset must be kept track of. The full logic then is:

  • Add an inotify watch on the interactive session file for IN_CLOSE_WRITE events
  • On a close write event, read the file from the last offset to the end of file
  • Write this read data to the stdin of the executing process

The code snippet to accomplish this is shown below:

    ...
    if (pEvent->mask & IN_CLOSE_WRITE)
    {
        std::string fileName = pEvent->name;

        if (fileName == stdinFileName)
        {
            std::ifstream file(inputFileFullPath, std::ios::in | std::ios::binary);

            std::vector<char> contents;
            file.seekg(lastReadOffset, std::ios::end);
            contents.reserve(file.tellg());
            file.seekg(lastReadOffset, std::ios::beg);

            contents.assign((std::istreambuf_iterator<char>(file)),
                std::istreambuf_iterator<char>());

            lastReadOffset += contents.size();

            size_t bytesWritten = 0;
            do
            {
                ssize_t written = write(masterFileDescriptor, contents.data() + bytesWritten, contents.size() - bytesWritten);
                if (written == -1)
                {
                    perror("write");
                    break;
                }
                    bytesWritten += written;
                    } while (bytesWritten < contents.size());
                }
            }
    ...

And that’s all there is to it. At this point the entire system has been described end-to-end: from the time the user adds a source file to the input folder to how they get a response back for their executed program. The next series of posts will cover how to refine a system a bit further from a deployment perspective; namely how to containerize the code using Docker and how to provide some resiliency using Kubernetes.

Creating a multi-language compiler system: File Watcher, Bash (5/11)

Filed under: Programming — admin @ 10:29 PM

Table of Contents:

The previous post covered the first half of the file watcher component: the background C++ process responsible for monitoring the input file directory and starting the compilation and execution process. This next part will cover the helper Bash script that actually does the work. This script is composed of two parts: the main script, which is a general script responsible creating and cleaning up the directories that will be used, and language-specific scripts responsible for running the process of compilation and execution.

Main script

The main script is what is invoked by the background process. The path and name to this script is configured via the configuration file:

"bootstrappath": "${CODE_PATH}/share/bootstrap",
"bootstrapscriptname": "bootstrap-common.sh",
...

The background process builds the command to invoke this script (see NotifyChildProcess::buildCommand in the previous post), and captures its output. The main script is shown in its entirety below:

#!/bin/bash

POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"

case $key in
    -f|--filepath)
    FILE_PATH="$2"
    shift
    shift
    ;;
    -a|--arguments)
    ARGUMENTS_PATH="$2"
    shift
    shift
    ;;
    -i|--index)
    INDEX="$2"
    shift
    shift
    ;;
    -d|--dependenciespath)
    DEPENDENCIES_PATH="$2"
    shift
    shift
    ;;
    -w|--workspacepath)
    WORKSPACE_PATH="$2"
    shift
    shift
    ;;
    -o|--outputpath)
    OUTPUT_PATH="$2"
    shift
    shift
    ;;
    -s|--stdinPath)
    STDIN_PATH="$2"
    shift
    shift
    ;;
    -t|--timeout)
    INTERACTIVE_TIMEOUT="$2"
    shift
    shift
    ;;
    -l|--language)
    LANGUAGE="$2"
    shift
    shift
    ;;
    *)
    
    POSITIONAL+=("$1")
    shift
    ;;
esac
done
set -- "${POSITIONAL[@]}"

FILE_NAME="$(basename ${FILE_PATH})"
ARGS_FILE_EXISTS=false
STDIN_FILE_EXISTS=false

OUTPUT_NAME=${WORKSPACE_PATH}/${INDEX}/${FILE_NAME}

if [ -f "${ARGUMENTS_PATH}" ]; then
    ARGS_FILE_EXISTS=true
fi

if [ -f "${STDIN_PATH}" ]; then
    STDIN_FILE_EXISTS=true
fi

create_directories () {
    rm -rf ${WORKSPACE_PATH}/${INDEX}
    mkdir ${WORKSPACE_PATH}/${INDEX}
}

cleanup () {
    rm ${FILE_PATH}
    if [ "${ARGS_FILE_EXISTS}" = "true" ]; then
        rm ${ARGUMENTS_PATH}
    fi
    if [ "${STDIN_FILE_EXISTS}" = "true" ]; then
        rm ${STDIN_PATH}
    fi
    rm -rf ${WORKSPACE_PATH}/${INDEX}
}

copy_dependencies () {
    cp ${FILE_PATH} ${WORKSPACE_PATH}/${INDEX}
    cp -r ${DEPENDENCIES_PATH}/. ${WORKSPACE_PATH}/${INDEX}
}

move_output () {
    rm -rf ${OUTPUT_PATH}/${INDEX}
    mkdir ${OUTPUT_PATH}/${INDEX}
    mv ${OUTPUT_NAME}-output.log ${OUTPUT_PATH}/${INDEX}
}

main () {
    create_directories
    copy_dependencies
    (cd ${CODE_PATH}/share/${LANGUAGE}/bootstrap/${LANGUAGE}; source ./bootstrap.sh; run_command)
    move_output
    cleanup
    
    exit ${result}
}

main

Despite the large size of the script, half of it is just related to argument parsing and storage. The other half is for the actual functionality: creating the isolated workspace directory that the compilation and execution will take place in (create_directories function), copying the dependencies to this workspace directory (copy_dependencies function), moving the output to the output directory and cleaning up (move_output and cleanup functions). The run_command function is the language-specific function that will get invoked from its respective language directory.

Language-specific scripts

Each language has its own way to go from source code to executable. Compiled languages require a compilation step to generate an executable, while interpreted ones like Python can have the interpreter run directly on the source code. As a result, each language has its own language-specific script responsible for implementing this functionality.

For example, the run_command implementation for C files looks like this:

#!/bin/bash

run_command () {

    TIMEOUT_SECONDS_COMPILE=15s
    TIMEOUT_SECONDS_RUN=10s

    timeout ${TIMEOUT_SECONDS_COMPILE} gcc -Wall -std=c17 -Wno-deprecated ${OUTPUT_NAME} -o ${OUTPUT_NAME}.out >> ${OUTPUT_NAME}-output.log 2>&1
    result=$?
    
    if [ $result -eq 0 ]
    then
        chmod 753 ${OUTPUT_NAME}-output.log
        if [ "${ARGS_FILE_EXISTS}" = "true" ]; then
            ARGUMENTS=$(cat ${ARGUMENTS_PATH})
        fi
        if [ "${STDIN_FILE_EXISTS}" = "true" ]; then
            TIMEOUT_SECONDS_RUN=${INTERACTIVE_TIMEOUT}
            STDIN_ARGUMENTS="-s ${STDIN_PATH}"
        fi
        
        sudo su-exec exec ${EXEC_PATH}/executor -t ${TIMEOUT_SECONDS_RUN} ${STDIN_ARGUMENTS} -o ${OUTPUT_NAME}-output.log -f ${OUTPUT_NAME}.out ${ARGUMENTS}
        result=$?
    fi
}

This function begins by invoking the GCC compiler to create an executable. The output of this process is captured via stream redirection into an output file, so if the compilation fails then the cause will be still be captured and presented to the user. Once the executable is generated, the execution component is called to run it. Both of these steps come with a timeout value in order to prevent issues with hanging compiler processes or infinitely running executables.

Once the execution component has completed, the process return code is returned to the main script and subsequently the background process. The implementation of the execution component will be covered in-depth in the next post. After the discussion of the execution component, the end-to-end details of how the system works will be complete.

Creating a multi-language compiler system: File Watcher, C++ (4/11)

Filed under: Programming — admin @ 10:28 PM

Table of Contents:

These next few posts will go in to detail about the file watcher component. As previously discussed, this component is responsible for watching changes in the input directory and kick starting the compilation and execution process. The file watcher will be implemented as a background process, written in C++, along with some helper Bash scripts.

Configuration

The runtime configuration for the file watcher will be a straightforward JSON file. The deserialization functionality will come courtesy of the cereal library. In this configuration will be everything that is needed for the file watch process to perform its functionality: the paths to the various input, output, and intermediate directories and files, the set of supported languages, and whether to run in single threaded or multi-threaded mode. The final configuration is shown below. From looking at it, you can see that there are several variables that are meant to be substituted. How and why this is done will be covered in a future post, and a brief explanation of each field is provided below.

{
     "configuration": {
         "inputpath": "${CODE_PATH}/share/${LANGUAGE}/input/${UNIQUE_ID}",
         "outputpath": "${CODE_PATH}/share/${LANGUAGE}/output/${UNIQUE_ID}",
         "workspacepath": "${CODE_PATH}/share/${LANGUAGE}/workspace/${UNIQUE_ID}",
         "dependenciespath": "${CODE_PATH}/share/${LANGUAGE}/dependencies",
         "argumentspath": "${CODE_PATH}/share/${LANGUAGE}/arguments/${UNIQUE_ID}",
         "stdinpath": "${CODE_PATH}/share/${LANGUAGE}/stdin/${UNIQUE_ID}",
         "interactivetimeout": "600s",
         "relocatepath": "${CODE_PATH}/share/relocate/${UNIQUE_ID}",
         "bootstrappath": "${CODE_PATH}/share/bootstrap",
         "bootstrapscriptname": "bootstrap-common.sh",
         "supportedlanguages": [${SUPPORTED_LANGUAGES}],
         "ismultithreaded": ${IS_MULTITHREADED}
     }
 }

Hopefully most of these are pretty straightforward from their naming, or from having read the previous posts outlining the general architecture of the system. The inputpath is the folder where the user will provide the source code file to compile, and correspondingly the outputpath is where the execution output will be written to. The workspacepath is the folder where the compilation and execution will take place. As mentioned previously, this is to allow for multiple compilation processes at the same time without fear of one interfering with another.

The dependenciespath has not been discussed yet and is the path where dependencies for each language are present. What is meant by this is that more than just a source file is needed to compile under some languages. Specifically, to support compiling C# on the command line under .NET Core, there must be a .csproj file present in the directory. When the compilation process is to begin, everything present in the dependenciespath for a language is copied to the workspace path. With the languages supported under this particular system this is only an issue for C# and all others have an empty dependenciespath.

The argumentspath, stdinpath, and interactivetimeout all have to do with providing command-line input to a running executable. The argumentspath is the folder where the command-line arguments are stored for the source code. Likewise, the stdinpath is the folder where the input for the interactive session is stored. This file can change during the execution of a program and its changes will be picked up and written to the stdin of the running process by the execution component. The interactivetimeout is the time limit that this interactive session, where a user can provide input to a running process, can have.

For resiliency, there is a relocatepath field, which is the directory that input files that have not been processed yet will go in the event of a crash. This is done so that they may be relocated to another instance that is active for processing. The next two fields, bootstrappath and bootstrapscriptname are for the Bash scripts that are responsible for performing the core functionality: settings up the workspace folders, compiling the code, and invoking the execution component to run the executable and capture the output. The implementation of these Bash scripts will be covered in the next post.

Lastly, there are two, hopefully self-explanatory, fields supportedlanguages and ismultithreaded, which contain a list of supported languages for the compiler system and whether to run in multi-threaded mode.

This configuration file has a corresponding object in the file watcher code. The NotifyConfiguration object is defined below:

struct NotifyConfiguration
{
    std::string m_inputPath;
    std::string m_outputPath;
    std::string m_workspacePath;
    std::string m_dependenciesPath;
    std::string m_argumentsPath;
    std::string m_stdinPath;
    std::string m_interactiveTimeout;
    std::string m_relocatePath;
    std::string m_bootstrapPath;
    std::string m_bootstrapScriptName;
    std::vector<std::string> m_supportedLanguages;
    bool m_isMultithreaded;
};

The code to read this file at runtime is pretty straightforward thanks to the ease of the cereal API:

static std::shared_ptr<T> Read(const std::string& filePath, const std::string& name)
{
    std::ifstream fileStream(filePath);
    if (!fileStream.is_open())
    {
        std::cerr << "Failed to open " << filePath << std::endl;
        return nullptr;
    }

    T object;
    {
        cereal::JSONInputArchive iarchive(fileStream);
        iarchive(cereal::make_nvp(name.c_str(), object));
    }
  
    return std::make_shared<T>(object);
}

Event handlers

Listening to directory change events is what powers the entire compilation and execution process. The previous post covered the inotify API and this implementation just expands on that for a bit. Since multiple languages are supported, there needs to be a mapping between a language and an input folder:

std::unordered_map<std::string /*Language*/, std::unique_ptr<NotifyEventHandler> /*Handler*/> m_dispatchTable;

At runtime, each supported language will register a handler and add it to the map.

bool NotifyEventHandlerTopmost::AddHandler(std::unique_ptr<NotifyEventHandler> handler)
{
	const std::string& language = handler->Language();
	if (m_dispatchTable.find(handler->Language()) != m_dispatchTable.end())
	{
		std::cerr << "Handler for " << language << " already exists." << std::endl;
		return false;
	}

	std::cout << "Adding handler for " << language << std::endl;
	m_dispatchTable.insert({ handler->Language(), std::move(handler) });
	return true;
}

When an input file is added, its handler is found in the map, and if it exists, that handler subsequently gets invoked:

void NotifyEventHandlerTopmost::Handle(std::shared_ptr<NotifyEvent> event)
{
    const std::string& type = event->Language();
    auto handler = m_dispatchTable.find(type);
    if (handler == m_dispatchTable.end())
    {
        std::cerr << "No handler found for " << event->Language() << " files" << std::endl;
        return;
    }

    handler->second->Handle(event);
 }

Under this current implementation, all languages share a common handler, which launches a child process to compile and execute the source code, and subsequently reads the captured output.

void NotifyEventHandlerLanguage::Handle(std::shared_ptr<NotifyEvent> event)
{
    NotifyChildProcess childProcess(event);
    childProcess.Launch();

    auto outputPath = event->OutputPath() + std::filesystem::path::preferred_separator + std::to_string(event->Index()) +
		std::filesystem::path::preferred_separator + event->FileName() + ".log";
    const NotifyExecutionResult resultObject{ childProcess.Result(), childProcess.Output() };
    NotifySerializer<NotifyExecutionResult>::Write(outputPath, "result", resultObject);
}

Child process

The child process code is rather minimal. Since most of the workflow is delegated to a Bash script, the only things that the code needs to do is to format the arguments, call the Bash script, and read the output. The Bash script takes in a fair number of arguments whose usage and purpose will be explained in the next post.


void NotifyChildProcess::buildCommand(std::shared_ptr<NotifyEvent> notifyEvent)
{
	const auto argumentsFullPath = notifyEvent->ArgumentsPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName() + ".args";
	const auto inputFileFullPath = notifyEvent->InputPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName();
	const auto stdinFileFullPath = notifyEvent->StdinPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName() + ".stdin";

	m_builtCommand.reserve(512);
	m_builtCommand = notifyEvent->BootstrapPath() + std::filesystem::path::preferred_separator + notifyEvent->BootstrapScriptName()
		+ " -f " + inputFileFullPath
		+ " -a " + argumentsFullPath
		+ " -s " + stdinFileFullPath
		+ " -t " + notifyEvent->InteractiveTimeout()
		+ " -i " + std::to_string(notifyEvent->Index())
		+ " -d " + notifyEvent->DependenciesPath()
		+ " -w " + notifyEvent->WorkspacePath()
		+ " -o " + notifyEvent->OutputPath()
		+ " -l " + notifyEvent->Language();
}

bool NotifyChildProcess::Launch()
{
	auto pipe = popen(m_builtCommand.c_str(), "r");
	if (!pipe)
	{
		perror("popen");
		std::cerr << "Could not execute " << m_builtCommand << std::endl;
		return false;
	}

	m_result = pclose(pipe);

	m_output = NotifyFile::ReadFile(OutputFilePath());

	return Success();
}

Multi-threading

The last feature to cover is related to how the events – the user adding a source file to the input directory – will be processed: serially or in parallel. As mentioned above, this is controlled via the ismultithreaded configuration parameter. Under the typical scenario an event comes in, gets processed, a child process is launched, and the thread blocks until the child process terminates. This works fine, but can be improved by taking advantage of parallelism. There shouldn’t be any dependencies between different users input files, so there shouldn’t be anything stopping us from running this process in parallel*.

To run in parallel, the handler for the event is called on a separate thread. This is done by taking advantage of a third party thread-pool library that is responsible for instantiating the thread pool. The event dispatch code is shown below:


void NotifyEventDispatcher::dispatchEvent(const inotify_event* pEvent)
{
	std::cout << "Read event for watch " << pEvent->wd << std::endl;

	if (pEvent->len <= 0)
	{
		std::cerr << "No file name associated with event. Watch descriptor = " << pEvent->wd << std::endl;
		return;
	}

	if (pEvent->mask & IN_CLOSE_WRITE)
	{
		std::string fileName((char*)pEvent->name);
		auto config = m_manager->Configuration();
		std::shared_ptr<NotifyEvent> notifyEvent(new NotifyEvent(config, fileName, pEvent->wd, pEvent->mask));

		if (m_manager->Configuration()->m_isMultithreaded && m_threadPool != nullptr)
		{
			m_threadPool->enqueue([this](std::shared_ptr<NotifyEvent> notifyEvent)
				{ m_handler->Handle(notifyEvent); },
				notifyEvent);
		}
		else
		{
			m_handler->Handle(notifyEvent);
		}
	}
}

This concludes the C++ portion of the file watcher. The next post will cover the Bash script portion and detail how the input file gets compiled and executed.

* I haven’t fully evaluated this. Although the code itself can run fine multi-threaded, there may be compilers that aren’t friendly to having multiple instances run at the same time.

Creating a multi-language compiler system: The inotify API (3/11)

Filed under: Programming — admin @ 10:28 PM

Table of Contents:

The compiler system, as currently described and designed in the previous posts, relies on being able to respond to user input, i.e. when a user adds in their source file to the queue (folder). Accomplishing this requires monitoring the input folder for changes and beginning the compilation and execution process when a new source file has been detected. Given that this project will be mostly in C++, there are multiple ways of implementing this feature, ranging from platform independent solutions such as std::filesystem to platform-specific ones like inotify for Linux or FindFirstChangeNotification for Windows.

As mentioned in the previous post, we are targeting the Linux platform so the inotify API seems like a natural and simple solution. This API is responsible for monitoring changes on the filesystem and notifying registered applications of those changes, which is exactly what we want. The inotify API exposes three functions: inotify_init, inotify_add_watch, and inotify_rm_watch. The first function creates an inotify instance and returns a file descriptor to it. The next two functions take this file descriptor and allow you to add or remove a watch to a directory – a watch here meaning a monitor to filesystem changes.

A mask is passed in to inotify_add_watch which specifies what type of events to notify on, i.e. file or directory move, close, open, delete, etc. The full list of flags can be found on the link for the inotify page above. Since we are interested in notifying when a user has added their file to the input folder, the mask that is needed is IN_CLOSE_WRITE. After the watch is added on the directory, the application will receive an inotify_event each time a file is added to the folder. This inotify_event structure has the following definition:

struct inotify_event {
    int      wd;       /* Watch descriptor */
    uint32_t mask;     /* Mask describing event */
    uint32_t cookie;   /* Unique cookie associating 
                          related events (for rename(2)) */
    uint32_t len;      /* Size of name field */
    char     name[];   /* Optional null-terminated name */
};

This event tells us all we need to know: which watch triggered the event, which mask this event corresponds to, as well as the file name. We will listen to these events in a callback and process them as they come in. Sample code is shown below for how to use this API. This code will serve as the template how the file watcher component, covered in the next series of posts, will be implemented.

#include <array>
#include <climits>
#include <iostream>

#include <sys/inotify.h>
#include <unistd.h>

void MonitorDirectoryChange(const int notifyFileDescriptor)
{
	constexpr auto BUFFER_SIZE = (10 * (sizeof(struct inotify_event) + NAME_MAX + 1));
	std::array<char, BUFFER_SIZE> readBuffer;
	while (true)
	{
		auto bytesRead = read(notifyFileDescriptor, readBuffer.data(), readBuffer.size());
		if (bytesRead == -1)
		{
			perror("read");
			break;
		}

		for (auto* bufferStart = readBuffer.data(); bufferStart < readBuffer.data() + bytesRead; /*Empty*/)
		{
			inotify_event* pEvent = (inotify_event*)bufferStart;
			if (pEvent->mask & IN_CLOSE_WRITE)
			{
				std::string fileName((char*)pEvent->name);
				std::cout << "File has been added: " << fileName << std::endl;
			}

			bufferStart += sizeof(inotify_event) + pEvent->len;
		}
	}
}

int main(int argc, char* argv[])
{
	if (argc != 2)
	{
		std::cerr << "Incorrect number of arguments. Specify a directory to watch";
		exit(EXIT_FAILURE);
	}

	int notifyFileDescriptor = inotify_init();
	if (notifyFileDescriptor == -1)
	{
		perror("inotify_init");
		exit(EXIT_FAILURE);
	}

	int watchDescriptor = inotify_add_watch(notifyFileDescriptor, argv[1], IN_CLOSE);
	if (watchDescriptor == -1)
	{
		perror("inotify_add_watch");
		exit(EXIT_FAILURE);
	}

	std::cout << "Beginning monitoring on " << argv[1] << std::endl;
	MonitorDirectoryChange(notifyFileDescriptor);
	std::cout << "Finished monitoring" << std::endl;

	return EXIT_SUCCESS;
}

In this example program, the directory to watch is provided via the command line. If everything is successful, a watch for IN_CLOSE_WRITE events is added. These events are then monitored in a loop and the program outputs the name of files added to the directory when they are added. A screenshot of the execution is shown below, with a few files added to the target directory:

Output of the program when files have been added to the monitored directory

The next series of posts will cover the file watcher component of the system. This component will leverage the inotify API to pick up added source code files and begin the compilation and execution process.

« Newer PostsOlder Posts »

Powered by WordPress