Tuesday, 2 November 2010

Convert Word documents to PDF in SharePoint Server 2010 using PowerShell

SharePoint Server 2010 Standard and Enterprise editions includes a feature called Word Automation Services. There is an excellent article here explaining it in more detail, but in summary, the feature provides the capability for converting documents from one Microsoft Word format to another (e.g., .doc to .docx), converting Word documents to .pdf or .xps, and execute tasks during the opening of a document – such as updating the Table of Contents or index fields.

Whilst there are probably more applications for converting documents in workflows and event receivers, a batch of documents can also be converted using PowerShell. In this article I shall run through the basic steps involved in converting documents with an example of how to convert Word documents to PDF, plus some information on how you can speed up the conversion process and also remove the original Word documents from SharePoint, once successfully converted.

Before you can start converting documents, you will need to create a Word Automation Service Application instance from the Manage Service Applications page in Central Administration, and start the Word Automation Services service from the Manage services on server page, also in Central Administration. I’m also assuming that you have a site ready with a document library containing the Word documents to be converted, as shown below.

1

Once you have your service application running, you can type the following two lines in PowerShell to connect to the Word Automation Services Proxy and set up the variable for a new conversion job:

$wasp = Get-SPServiceApplicationProxy | where { $_.TypeName -eq "Word Automation Services Proxy" }
$job = New-Object Microsoft.Office.Word.Server.Conversions.ConversionJob($wasp)

Next, we will get the site, folder containing the Word documents to be converted, and folder to store the converted PDF documents (this folder can be the same). Note that in addition to converting documents in folders, the ConversionJob class also contains methods to add individual files or complete document libraries, if preferred:

$web = Get-SPWeb http://portal/team
$inputFolder = $web.GetFolder("Shared Documents/Word Documents")
$outputFolder = $web.GetFolder("Shared Documents/PDF Documents")

We can now set up our conversion job using the properties available in the ConversionJob class. This includes a description for the job, specifying PDF as the save format, and choosing to overwrite any PDF documents with the same name that currently exist in the folder:

$job.UserToken = $web.CurrentUser.UserToken
$job.Name = "Convert Hello Docs to PDF"
$job.Settings.OutputFormat = [Microsoft.Office.Word.Server.Conversions.SaveFormat]::PDF
$job.Settings.OutputSaveBehavior = [Microsoft.Office.Word.Server.Conversions.SaveBehavior]::AlwaysOverwrite

The AddFolder method below includes the input folder (source folder containing the Word documents), output folder (destination folder into which the converted PDF documents will be copied), and a boolean option specifying whether you want to include sub-folders under the input folder. For example, if this is set to $true, then documents will be converted from the input folder and all documents in sub-folders below it:

#Add input and output folders and start conversion
$job.AddFolder($inputFolder, $outputFolder, $false)
$job.Start()

You can view the status of your job by typing the following commands:

$status = New-Object Microsoft.Office.Word.Server.Conversions.ConversionJobStatus($wasp.Id, $job.JobId, $null)
$status

An example of a conversion status report is shown below:

2
If you are wondering why the status for your documents are shown as “NotStarted”, it is because they are converted by a timer job, which by default runs every 15 minutes. If you want to start this timer job now instead of waiting, type the following:

$watj = Get-SPTimerJob "Word Automation Service Application"
$watj.RunNow()

Once the timer job has been run, the updated status will look as follows:

3

There is a nice bit of C# code in this article from the Microsoft Word product team blog which waits until all documents have been converted and then deletes the successfully converted Word documents from the input location. I have adapted this for use in PowerShell below. Just tack it on to the end of the timer job commands if you want to include it in a PS1 script file:

[bool]$done = $false
write-host "Converting files - Please wait..."
while(!$done)
{
    Start-Sleep -s 5
    $status = New-Object Microsoft.Office.Word.Server.Conversions.ConversionJobStatus($wasp.Id, $job.JobId, $null)
   
    if ($status.Count -eq ($status.Succeeded + $status.Failed + $status.Canceled))
    {
        $done = $true
       
        #Delete original Word files successfully converted to PDF
        #Remove this code if you want to keep the documents in their original location
        $itemType = [Microsoft.Office.Word.Server.Conversions.ItemTypes]::Succeeded
        $items = $status.GetItems($itemType)
        foreach($item in $items) {
            $file = $web.GetFile($item.InputFile)
            $file.Delete()
        }
    }
}
write-host "Conversion operation complete - Status report:"
$status
$web.Dispose()

Once the job has completed, you should see the converted PDF files in the output location specified in the script:

4

18 comments:

  1. So I've got your code running, many thanks! Question: my word doc has some embedded links to external files (when you open the doc, it asks to get the latest content from the linked files). When I add the $job.Settings.UpdateFields = $True
    I was hoping those links would be refreshed. No such luck. Any ideas?

    ReplyDelete
  2. Thanks, great work!

    Here is another solution to create PDF documents:
    http://www.parago.de/2011/04/how-to-export-sharepoint-task-list-data-to-pdf-using-a-templating-system/

    The solution is dynamically creating a PDF document from a SharePoint list using a template engine to customize the PDF output.

    ReplyDelete
  3. To enhance document conversion (including conversion in PDF) in sharepoint workflows, there is HarePoint Workflow Extensions software
    ( http://www.harepoint.com/Products/HarePointWorkflowExtensions/ ) - about 200 new workflow activities, including free ones.

    ReplyDelete
  4. Great post Phil!

    Have you run this against newer SP2010 CU patch levels? I am able to run the above PowerShell against RTM, but not against SP1 or SP1+Dec CU. Below is a thread where others are seeing the same WCF failure. Any ideas would be greatly appreciated! !

    Best,
    @SPJeff


    http://social.technet.microsoft.com/Forums/en-US/sharepoint2010programming/thread/b3933e94-d73b-4bb2-b6ff-f85ccd11947a/?prof=required

    ReplyDelete
  5. Hi,
    Thanks for your post.
    How to add text in to the Pdf?...

    ReplyDelete
  6. You would have to add text to the Word document before converting it.

    ReplyDelete
  7. Hi,

    Thanks for a good article. Anyone know if it´s possible to get the metadata from the doc to be written into the pdf item?

    ReplyDelete
  8. Very niced post and it was a great help for me.
    keep up good work!

    ReplyDelete
  9. تعد الاخلاص افضل شركة تنظيف ومكافحة حشرات بالطائف فهى تقدم افضل الخدمات وباقل الاسعار لانها تتميز بانها افضل :
    افضل شركة رش مبيدات بالطائف
    افضل شركة مكافحة حشرات بالطائف
    افضل شركة تنظيف خزانات بالطائف
    شركة تنظيف منازل بالطائف
    تسليك مجاري بالطائف
    ______________
    شركة عزل خزانات بالطائف
    _______
    نستخدم افضل المبيدات في عمليات الرش للقضاء النهائي على الحشرات بجميع انواعها ونوفر عدة خدمات تنظيف اخرى فدوما يمكنك الاتصال بنا للحصول على افضل الخدمات وباسعار مناسبة
    شركة رش مبيدات بالرياض
    افضل شركة تنظيف خزانات بالرياض
    شركة عزل اسطح بالرياض
    افضل شركة تنظيف بالخرج
    شركة نقل اثاث بالخرج
    افضل شركة تنظيف مجالس بالطائف
    افضل شركة تنظيف منازل بالطائف
    افضل شركة عزل اسطح بالطائف
    افضل شركة نقل عفش بالطائف

    ReplyDelete
  10. لابد من الاستعانة بشركة متخصصة في مكافحة النمل الابيض للتخلص منه وتلافي اخطاره فالاول تعد من افضل شركات مكافحة الحشرات وخدماتها تغطى جميع مناطق المملكة فالاول افضل شركة مكافحة النمل الابيض بالرياض وافضل شركة مكافحة النمل الابيض بجدة وافضل شركة مكافحة النمل الابيض بمكة وافضل شركة مكافحة النمل الابيض بالخبر وافضل شركة مكافحة النمل الابيض بالقصيم وتقوم ايضا بمكافحة النمل الابيض قبل البناء
    شركة مكافحة النمل الابيض بالمدينة المنورة
    الفبرونيل وهي مادة قوية جدا وفعالة جدا للنمل الابيض وهناك مادة البايفلكس
    بايفلكس Biflex TCشركة مكافحة النمل الابيض بالجبيل
    شركة مكافحة النمل الابيض بالاحساء
    - شركة مكافحة النمل الابيض بابها - شركة مكافحة النمل الابيض بينبع

    ReplyDelete
  11. عميل كشف تسربات المياة الكريم تتسائل من اين تأتى التسربات وتظل تبحث وتتجول فى المنزل لترى من اين يأتي تسرب المياة بدون تعب
    شركة كشف تسربات المياه بالمدينة المنورة
    ومشقة اتصل بنا لتدلك كشف تسربات المياه بالمدينة المنورة على المكان الذى يتسرب منه الماء انت الان تتسائل انت صاحب البيت ولم تعرف من اين يتسرب الماء فكيف سنعرف نحن اقول لك عزيزى العميل ان شركة كشف تسربات تستورد احدث الاجهزة المستخدمة فى المجال والتى تدلك على مكان التسرب وليس هذا فقط كشف تسربات المياه لديها اجهزة تصليح للاضرار ايضا اسأل عن افضل شركة كشف تسربات المياه اسم له تاريخ فى مجالات النظافة
    كشف تسربات المياه بالمدينة المنورة
    والتسربات والتسليك وكل شيء عام الان اتصل بنا تصلك شركة كشف تسربات المياه لباب منزلك او فلتك او شقتك او قصرك.




    ReplyDelete
  12. The main reason why people use the typing services is because the services aid in facilitating the daily routine. typing documents

    ReplyDelete