The problem with the current Dotrine way or deciding whether to insert or update an entity by a unique column is quite useless. The official way to do it is to fetch the entity by the unique key and if it is found then update, if it's not found then insert. Such approach is good for a live demo website during a conference talk, but it is kinda useless in a real world app with many concurrent requests and workers as the entity with the same key can be created between fetch and insert. They a duplicate is created or if the column is unique on the database level then the entity manager is closed and user will get a 500 error.

That's not fun. Issues like that are hard to find, hard to replicate and it turns out, even harder to fix.

One option is to fetch and insert or update within a transaction. That sounds good, but in the real world application the fetch is usually happening in completely different class then saving the entity when all the work on the entity had been done, sent to deveral subscribers and so on. So transactions are kinda hard to manage inside the code base. And the row may be locked for a fairly long time which can cause other queries to be queued and it can hit the timeout limit run out of connections when multiple processes will be waiting and keeping the connection alive and new ones will still be created.

We are using MySql for Mautic so we were looking into how to solve the problem for MySql with INSERT INTO ...ON DUPLICATE KEY UPDATE.... The good thing about Doctrine is that can support multiple databases, but it's also its curse. There are open issues to support upsert and there were attempts to do it but it's not an easy problem to implement it for all the supported databases.

So we implemented our own upsert method in a common repository. It's not public yet, only living in our fork at this point and will be pushed later so I'll share what it looks like in a code block.

    /**
     * Insert entity if it does not exist, update if it does.
     * ID is set to the enity after upsert.
     * Main reason to use this over fetch/save is to avoid race conditions.
     *
     * Warning: This method use DBAL, not ORM. It will save only the entity you send it.
     * It will NOT save the entity's associations. Entity manager won't know that the entity was flushed.
     */
    public function upsert(object $entity): void
    {
        $connection = $this->getEntityManager()->getConnection();
        $metadata   = $this->getClassMetadata();
        $identifier = $metadata->getSingleIdentifierFieldName();
        $makeUpdate = fn (string $column) => "{$column} = VALUES({$column})";
        $columns    = [];
        $values     = [];
        $types      = [];
        $set        = [];
        $update     = [];
        $hasId      = $metadata->containsForeignIdentifier;

        foreach ($metadata->getFieldNames() as $fieldName) {
            $value = $metadata->getFieldValue($entity, $fieldName);
            if ($metadata->isIdentifier($fieldName)) {
                if ($value) {
                    $hasId = true;
                } else {
                    continue;
                }
            }
            $column    = $metadata->getColumnName($fieldName);
            $columns[] = $column;
            $values[]  = $value;
            $types[]   = $metadata->getTypeOfField($fieldName);
            $set[]     = '?';
            $update[]  = $makeUpdate($column);
        }

        foreach ($metadata->getAssociationNames() as $fieldName) {
            $assocEntity = $metadata->getFieldValue($entity, $fieldName);
            if (!$metadata->isAssociationWithSingleJoinColumn($fieldName) || !is_object($assocEntity)) {
                continue;
            }
            $idCol     = ucfirst($metadata->getSingleAssociationReferencedJoinColumnName($fieldName));
            $idGetter  = "get{$idCol}";
            $column    = $metadata->getSingleAssociationJoinColumnName($fieldName);
            $columns[] = $column;
            $values[]  = $assocEntity->$idGetter();
            $types[]   = Types::STRING;
            $set[]     = '?';
            $update[]  = $makeUpdate($column);
        }

        $connection->executeStatement(
            'INSERT INTO '.$this->getTableName().' ('.implode(', ', $columns).')'.
            ' VALUES ('.implode(', ', $set).')'.
            ' ON DUPLICATE KEY UPDATE '.implode(', ', $update),
            $values,
            $types
        );

        if ($hasId) {
            return;
        }

        $id = (int) $connection->lastInsertId();

        $metadata->setFieldValue($entity, $identifier, $id);
    }

This is basically just a sligtly modified Connection::insert() method. Notice that it sets the ID back to the entity. So elegant... And it works great! Untill you try to replace the problematic old code that may look like this:

$entity = $repository->findOneBy(['uniqueColumn' => $uniqueValue]) ?? new SomeEntity();
$child = new SomeChildEntity();
$entity->addChild($child);
$child->setParent($entity);
// whatever you need to do here.
$entityManager->persist(child);
$entityManager->persist($entity);
$entityManager->flush();
// some more work done with other entities later on.

with the new upsert method:

$entity = new SomeEntity();
$child = new SomeChildEntity();
$entity->addChild($child);
$child->setParent($entity);
// whatever you need to do here.
$repository->upsert($entity);
$entityManager->persist($child);
$entityManager->flush();
// some more work done with other entities later on.

And the rest of the code will break hard because the related $child entities are being saved later on and the Entity Manager will perform another insert for the $entity which will lead either to duplicates or an SQL error depending on the column uniqueness configuration.

There must be a way how to tell the Entity Manager to act like the entity with the manually-set ID does not need an INSERT query. Sadly, I couldn't find any by searching the web nor by reading the code. Please let me know if you have a solution for this!

What I ended up doing was:

$entity = new SomeEntity();
$child = new SomeChildEntity();
// whatever you need to do here.
$repository->upsert($entity);
$child->setParent($entityManager->getReference(Entity::class, $entity->getId()));
$entityManager->persist($child);
$entityManager->flush();
// some more work done with other entities later on.

Yep, using references. Not a pretty workaround. And you may hunt this in many places in your existing code when you try to use the upsert method and replace the actual entities with references. One could even think "what is the reason to use ORM at this point?", right?

Or if you don't care about the extra query:

$entity = new SomeEntity();
$child = new SomeChildEntity();
// whatever you need to do here.
$repository->upsert($entity);
$entityManager->detach($entity); // to ensure the entity is not managed by the entity manager so the next row would do what we want.
$entity = $this->entityManager->find(SomeEntity::class, $entity->getId());
$child->setParent($entity);
$entityManager->persist($child);
$entityManager->flush();
// some more work done with other entities later on.

The reason I'm writing this down is that I cannot believe that there is no a simpler way to do this and hope that someone will set me straight and I'll be able to refactor the code and this article. If there isn't a better solution, then this may help someone who is having the same concurrency issues.

Previous Post