Интервью с Джулианом Эгельстаффом, техническим архитектором Freeform Solutions

Недавно у меня была возможность встретиться с Джулианом Эгельстаффом , техническим архитектором в Freeform Solutions . Джулиан присутствовал на прошлогоднем конкурсе SQL Server Лагерь, где он работал с инженерами Microsoft и другими участниками, чтобы добавить поддержку SQL Server в Formulize , приложение PHP, которое позволяет быстро создавать формы и отчеты на вашем веб-сайте. Джулиан является соучредителем Freeform Solutions , имеет 10-летний опыт разработки PHP и является сертифицированным инженером Zend .

Помимо того, что я просто догонял, моя цель в разговоре с Джулианом была двоякой:

В общем, я хотел лучше понять, как он преодолел проблемы, связанные с расширением приложения для поддержки нескольких баз данных, и…
В частности, я хотел лучше понять, как он преодолел проблемы, с которыми он столкнулся при добавлении поддержки SQL Server в приложение.

Мне нравится думать, что задним числом является 20-20, так что по прошествии нескольких месяцев после JumpIn! Кэмп, я подумал, что сейчас самое время подумать над этими проблемами. К счастью, Джулиан был готов поделиться своим опытом …

Брайан : Во-первых, я знаю, что вы очень заняты, поэтому я хочу сказать спасибо, что нашли время поговорить со мной: Спасибо! ? Может быть, вы могли бы начать рассказывать мне немного о вашем фоне разработки?

Джулиан : Конечно. Я думаю, я довольно типичный PHP-разработчик — если типичный PHP-разработчик не программист по образованию. PHP имеет репутацию легкого в изучении, и есть много людей, использующих PHP, которые не обязательно имеют компьютерный опыт, и с моим дипломом бакалавра журналистики я прямо в этом лагере.

Тем не менее, я думаю, что я довольно нетипичен в этом, я пошел и стал одним из основателей технологического стартапа под названием Freeform Solutions и выполнял большую часть работы по разработке в организации в течение первых нескольких лет. Я получил сертификат ZCE по пути. Просто чтобы сделать вещи более нетипичными, Freeform Solutions не было обычным стартапом. Мы некоммерческая организация, и мы в первую очередь нацелены на то, чтобы помочь другим некоммерческим организациям использовать технологии для выполнения своих задач.

Брайан : Одно из решений, которое вы предлагаете некоммерческим организациям, — Formulize . И это приложение, над которым вы работали в JumpIn! Лагерь. Можете ли вы рассказать мне больше о Formulize?

Джулиан : Formulize выросла из ранней работы в Freeform Solutions. Было два главных влияния, которые сформировали это.

Во-первых, во многих организациях, с которыми мы работали, была общая потребность в некоторых простых инструментах управления данными… формах и отчетах, стандартных вещах, которые нужны всем. За исключением некоммерческих организаций, с которыми мы работали, у них не было ресурсов, позволяющих ИТ-специалистам постоянно настраивать и строить для них какую-то систему отчетности. Им нужно было иметь возможность самостоятельно добавлять новые формы и иметь легкий доступ к информации, которую люди вводят в формы, без постоянной поддержки кого-либо.

Так что Formulize — это прежде всего решение такой проблемы. Это что-то вроде CMS для форм и данных, а не для «контента». Нетехнический пользователь может создать форму, описать через графический интерфейс пользователя, как она вписывается в шаблоны рабочих процессов организации, как она связана с данными в других формах и т. Д. И затем они также могут составлять базовые, а не базовые, экраны отчетности и другие экраны интерфейса, чтобы люди могли взаимодействовать с формой и ее данными.

Второе влияние было нашей потребностью, моей необходимостью, чтобы иметь возможность повторно использовать все, что мы сделали, как можно больше. Таким образом, начиная с версии 2, Formulize был разработан очень обобщенно. Он может использоваться CMS-подобным образом нетехническими пользователями, но под капотом он спроектирован больше как фреймворк.

Теперь все говорят, что они являются основой, особенно в мире CMS, потому что все пытаются охватить все основы. Но я думаю, что Formulize уникален в этом отношении. Все другие системы, о которых я знаю, которые происходят из традиции CMS, могут быть framework-ish и позволять вам писать код для этой системы в framework-ish-стиле. Но ваш код все еще застрял внутри этой системы. Вы не можете установить свои модули Drupal в WordPress, и наоборот.

Formulize может быть подключен к Drupal, WordPress и любой другой системе PHP из-за своей очень обобщенной архитектуры. Он не делает никаких предположений о том, откуда он вызывается. После JumpIn! Лагерь, я провел около часа в TangoCMS (один из других проектов, который был в лагере), а через час у меня было работающее приложение Formulize для проверки концепции внутри TangoCMS. Я думаю, именно это делает Formulize действительно интересной с точки зрения кода.

Брайан : Это есть интересный … и кажется , что это может быть корм для другого блога. ? Интересно, что Formulize имеет архитектуру, похожую на фреймворк — я уверен, что работа по добавлению нескольких баз данных отличалась от той, что была бы для многих других приложений. Прежде чем углубляться в эти детали, расскажите, почему вы хотели добавить поддержку нескольких баз данных в Formulize (в частности, поддержку SQL Server)?

Джулиан : На самом базовом уровне это просто хорошая практика с открытым исходным кодом, чтобы быть максимально совместимым со всем и всем.

Но на более прагматичном уровне я знал, что в коде было много предположений, связанных с MySQL. Возможность увидеть, как далеко мы можем продвинуться с помощью MSSQL, была хорошим способом узнать, где находятся препятствия и что нам нужно сделать, чтобы их обойти. Это полезно для MSSQL и любой другой базы данных, для которой мы хотим добавить поддержку.

Кроме того, мы говорили с специалистами по установке веб-платформ в Microsoft о добавлении Formulize в приложения, которые он может устанавливать, и, конечно, было бы неплохо иметь поддержку SQL Server для пользователей, которые полностью работают на платформе Microsoft.

Брайан : Когда дело дошло до добавления поддержки нескольких баз данных, я понимаю, что вы на раннем этапе сделали выбор не использовать PDO или какой-либо другой уровень абстракции базы данных. Это верно? Можете ли вы уточнить?

Джулиан : Ну, изначально это правильно. Как и во многих проектах с открытым исходным кодом, я думаю, мы ничего не решили, мы просто использовали то, что имели. Formulize начал свою жизнь как ответвление другой базы кода (длинная история), и в основном мы просто следовали этому примеру. У вилки, с которой мы начинали, была только поддержка MySQL. Поскольку MySQL всегда был доступен в средах, в которых мы работаем, это никогда не было проблемой. Итак, при доступе к MySQL мы используем класс, который построен на процедурном драйвере php_mysql. Кроме того, Formulize предшествует PDO, так что это еще один фактор. Но это не совсем PDO / не-PDO, что является главной проблемой здесь. PDO просто обеспечивает соединение, реальная абстракция БД — более серьезная проблема.

Но, поскольку я шел по пути от нетехнического разработчика PHP к разработчику PHP, который был вокруг всего этого блока, мне всегда казалось неправильным, что мы так привязаны к одному конкретному хранилищу данных. Но, как и в случае с большим количеством программного обеспечения с открытым исходным кодом, если у кого-то, кто использует его, нет реальной необходимости, новые функции обычно не добавляются. Все, кого мы знаем, были достаточно счастливы, чтобы выполнить нулевую работу и придерживаться существующей опции базы данных, то есть MySQL. С учетом вышесказанного, есть некоторые пользователи, которые используют Formulize в гетерогенных средах, где есть много разных баз данных и приложений, выполняющих различные задания во всей их ИТ-системе. Я знаю, что есть люди, которые используют IIS и, возможно, используют SQL Server в своей организации. Предоставление им возможности использовать Formulize просто имело смысл для всех.

Несмотря на это, идея уровня абстракции базы данных привлекательна, поэтому, когда мы добавили поддержку SQL Server, мы написали класс, который использует драйвер PDO_SQLSRV. Мы все еще хотели бы преобразовать наш код доступа MySQL в PDO, но этого еще не произошло.

Брайан : Итак, когда вы сели добавить поддержку SQL Server, каков был ваш план атаки?

Джулиан : Ну, я подумал, что большинство основных проблем, которые мы затронули первыми, будут синтаксическими. Просто различия между тем, как вы пишете SQL для MySQL против SQL Server. Поэтому я хотел посмотреть, насколько это будет плохо, о каких различиях мы говорим.

I’m a strong, strong believer in doing as little as necessary in the code when making changes. This comes from the heritage of trying to reuse everything as much as possible, and never having enough time and budgets to do the typical «this-version-rewritten-from-scratch» kind of thing that you see in some open source projects.

So my goal was to find what was the smallest set of changes we could make that would therefore likely have the least impact on the existing code base, and end up providing SQL Server support. If the existing SQL queries were mostly OK, then that would be a big step in that direction.

Brian: How did you identify SQL and server-specific differences (between MySQL and MSSQL) when adding MSSQL support?

Julian: Trial and error was the biggie. ? But there are other areas where we knew, a priori, that there were roadblocks, like the encryption option.

You can choose to encrypt data in Formulize, at the database level, if you really, really want to secure it against that kind of attack. We use the MySQL AES encryption function, we embed it in the SQL statements. So architecturally, that’s going to be a problem for abstracting support for other databases.

What we found worked for about 90% of the features in Formulize, was simply string manipulation of the SQL just prior to sending it to the DB. There’s a bunch of predictable syntactical differences between MySQL and MSSQL, such as backticks versus square brackets, SHOW COLUMNS needs to change to a SELECT statement on the information schema, the names of field types are systematically different, ie: char=>nchar, and on and on.

Brian: Can you elaborate on how you resolved the encryption issue?

Julian: Well, that is one area where there isn’t yet a committed fix. The encryption feature is something of an «edge case», it’s not widely used, as far as I’m aware, so it was pretty low on the list of things to tackle. From what I’ve read, I think the EncryptByKey and DecryptByKey functions may do the same kind of thing in SQL Server as the AES_ENCRYPT and AES_DECRYPT functions do in MySQL. But this is something I would want to get more info about from the SQL Server devs if possible. Nothing is ever simple.

Brian: I’ll see if I can get you in touch with some folks that could help with that. In the meantime, can you provide an example of how string manipulation solved the predictable syntactical differences you mentioned?

Julian: Well, there’s really simple examples, like this:

$sql = str_replace("NOW(), ", "SYSDATETIME(), ", $sql);

NOW is a MySQL function, SYSDATETIME is simply the equivalent function in SQL Server. But there’s also more advanced kinds of stuff, like when we’re getting info about the fields in a table. In MySQL, you use something called SHOW COLUMNS but in SQL Server, you have to query the «information_schema». So we do some more advanced analysis:

if(substr($sql, 0, 12)=="SHOW COLUMNS") {
    $fromPos = strpos($sql, "FROM");
    $tableName = substr($sql, $fromPos+5);
    $sql = "SELECT column_name
            FROM information_schema.columns
            WHERE table_name='" . trim($tableName) . "'";
    if($likePos = strpos($tableName, "LIKE")) {
        $likeName = substr($tableName, $likePos+5);
        $sql = substr($sql, 0, 69+$likePos);
        $sql .= "' AND column_name = ".trim($likeName);
    }
}

So that basically rewrites a SHOW COLUMNS statement from MySQL, into a completely new SQL Server query. And just pulls out the necessary parameters from the original statement, and sticks them into the SQL Server version in the right place. All just using simple string functions in PHP. This is a pretty fast operation in PHP, it’s just string manipulation in memory, so it adds very little overhead to your queries.

Something important to emphasize though, is that this is not an exhaustive replacement for SHOW COLUMNS in MySQL. It’s simply an effective replacement for how we’re using it in Formulize. For example, we’re not handling the situation where SHOW COLUMNS has a WHERE clause.

So we could get away with simple solutions like this that covered our use cases just fine. It does set up a maintenance issue though. If we add a SHOW COLUMNS statement with a WHERE clause, then we have to do more work here. So moving eventually to real DB abstraction would be preferable from that point of view. But this code works here works today, and it’s a very minimal intervention, and took very little time to create.

Brian: I know that you faced a challenge that is common when adding SQL Server support to a MySQL application: dealing with the MySQL LIMIT clause. How did you get around the LIMIT clause in your SQL Server implementation?

Julian: In true open source fashion, we borrowed from another project. ?

The good folks at Moodle had already integrated MSSQL support into their app, and they had written a function that took some SQL and two parameters for offset and limit, and returned to you a valid MSSQL statement that would mimic the behavior of a LIMIT clause. It uses the TOP syntax.

So basically, we just had to do some string analysis on our statements prior to execution, and if there was a LIMIT statement, then we remove it, grab the offset and limit values, and then basically just do one line of code:

$sql = $this->limit_to_top_n($sql, $offset, $limit);

Brian: Nice! That’s the most elegant solution I’ve seen for this problem. ? It looks like there were some issues with that Moodle function, but they have been resolved: http://tracker.moodle.org/browse/MDL-25321.

I’m sure you ran into some API discrepancies (php_mysql vs. php_pdo_sqlsrv). Were you able to address these as elegantly?

Julian: Well, not so much discrepancies, as idiosyncrasies that we had to be aware of, or double check how they worked.

One was returning the id of the last record that was inserted into the database. This is a pretty standard thing in lots of open source web apps that use MySQL. You do an insert statement and you get back the ID of the record you just inserted. You’re probably going to turn around and use that ID as a foreign key in some other statements.

In MSSQL, the official way to deal with these kinds of situations, I think, is that you’re probably going to package up a bunch of statements and do them as a transaction, and you’ve got the database engine handling the foreign key situation for you. But lots of open source apps that use MySQL are written to be compatible with the MyISAM storage engine, which doesn’t support transactions.

So for reasons like that, in a lot of web apps you end up handling many things at the PHP layer, which you might otherwise do in the database if you had a different database engine.

The PDO SQLSRV driver did provide the tools we needed, so we could get the necessary visibility in PHP about what the DB was doing. But we had to verify, for instance, that the lastInsertId method was bound to our database connection. I think you checked that one for me actually. Thanks. ?

And I think lastInsertId is only available in PDO, not the regular version of the SQLSRV PHP driver. Is that right?

Brian: That’s correct. If you are using the procedural SQLSRV API, there isn’t a lastInsertId function. To achieve the equivalent functionality, you would have to use the SQL Server SCOPE_IDENTITY() function. You could write a function to do this, something like…

function lastInsertId($conn){
    $stmt = sqlsrv_query($conn, "SELECT SCOPE_IDENTITY() AS id");
    $row = sqlsrv_fetch_array($stmt);
    return $row['id'];
}

This would allow you to then do something like this (with a table that has an auto-incremented id column):

// Execute an INSERT.
$sql = "INSERT INTO Table_4 (data)
        VALUES ('some_data')";
$stmt = sqlsrv_query($conn, $sql);

// Call custom function to get ID.
echo lastInsertId($conn);

But, since you are using the PDO driver, you don’t need that. Any other API differences that you had to address?

Julian: Yes. We had to use a «scrollable cursor» in order for MSSQL to be able to tell us the number of rows in a result. That’s something else which we were relying on knowing in PHP at certain points. We are setting the scrollable cursor it as part of the call to the prepare method, prior to executing the query. I believe that was the only place where we could force this behavior, I don’t think the connection options with the PDO driver supported this.

Brian: Right. In the SQLSRV API (as opposed to the PDO_SQLSRV API) you set a scrollable cursor in your connection options array when connecting to the server.

So, there were several challenges you had to tackle, but none of them seemed in surmountable. In retrospect, was going with “string manipulation” (as opposed to using an abstraction layer) a good choice?

Julian: Well, we knew what our inputs would be, ie: we didn’t have to write a complete SQL parser to take any arbitrary MySQL statement and spit out a MSSQL equivalent. So our problem space was pretty well constrained, and therefore, for most of our DB interactions, this simple string «translation» approach worked fine.

It’s the other lower level stuff, like the encryption, and certain kinds of math and other stuff beyond the basics that we backed off and thought a more structural approach will be required.

For example, we make extensive use of subqueries in some parts of the code. It gets kind of complicated to use string analysis to tell what is going on in your query when there are SELECT statements inside SELECT statements. I mean, is it a good idea to write a SQL parser in PHP? That’s the job of the database engine, no? So when your string analysis has to get that advanced and you’re not just doing glorified search-and-replace anymore, then maybe your effort is better spent on introducing some other layer that crafts the query from scratch in a way that is compatible with your database.

But, in the case of Formulize, string manipulation was a good choice — it lets us «release early, release often.» It fully works, but it doesn’t cover 100% of the problem space. But this way we can focus our refactoring on only the areas that really need it, instead of reinventing the entire way Formulize talks to the database.

This is sort of related to worse-is-better (http://www.jwz.org/doc/worse-is-better.html). I’m obviously valuing minimal-effort-to-working-code instead of a 100% «complete and correct» approach. It all depends what your priorities are, and to some extent, your philosophy.

Brian: Was anything easy about adding SQL Server support?

Julian: Well, yes…the data importing/migration. There are good tools for converting a MySQL database to MSSQL, like the SQL Server Migration Assistant. So that was nice, we didn’t have to fight with anything there. It migrated our schemas and data without any hassle.

I was developing on Windows, so I didn’t have any problem connecting to MSSQL, but the lack of a Linux driver for connecting to MSSQL is a huge barrier to adoption for devs who develop or deploy on Linux.

Brian: Yes, SQL Server access from PHP on Linux is a huge ask from PHP developers. I know that management in the SQL Server organization are aware of the request.

I’ve taken a lot of your time, so I’ll only ask one more question. Do you have any advice for developers looking to add new server support to an application?

Julian: Don’t be afraid of it…that would be the big thing. You will learn a lot about the new server for sure, but you’ll also learn a lot about your own app, because you’ll have to look at some of your assumptions from a different perspective.

Like the whole lastInsertId issue. They’re subtle things, but they arise from deep architectural decisions. They show how hard it is to actually draw boundaries between your programming language and your data store. The capabilities of each part of your stack, will impact your application in subtle and not so subtle ways. It’s hard to completely separate these layers of your app. Conversely, if you do completely separate them, are you making optimal use of the tools at your disposal? But if you want maximum portability for your app/code/system, you have to generalize to a certain extent.

In my opinion, these are the kinds of issues that make computer science interesting, not writing complex loops or OO code (which isn’t boring, but you know what I mean).

Brian: Agreed! I’m sure that any major undertaking, such as adding multi-database support, will take you down learning paths that you didn’t expect. Thanks for sharing and good luck with Formulize and Freeform Solutions!

Be sure to check out the Freeform Solutions website. You can learn more about Julian on his Freeform Solutions team page or by following him on twitter: @jegelstaff.

-Brian-

Интервью с Джулианом Эгельстаффом, техническим архитектором Freeform Solutions

Категории

Последние статьи

Рефакторинг Hudson God Class

Альтернативы синтаксиса Java лямбда

Morphia и MongoDB: развивающиеся структуры документов

OpenShift Express: развертывание приложения Java EE (с поддержкой AS7)

Интеграция jqGrid, REST, AJAX и Spring MVC