Wednesday, 2 November 2011

Use PowerShell to check for illegal characters before uploading multiple files into SharePoint

If you have done any sort of bulk file uploading into SharePoint, you will be aware of issues with file names containing illegal characters. These files can disrupt the uploading process, potentially causing many hours of frustrating and time consuming tasks examining and repairing file names.

Files and folders are blocked by SharePoint during the uploading process for the following reasons:

  • They contain the following characters: & { } ~ # % (there are other illegal characters too, but as they are also blocked from use in Windows Explorer, it is assumed you will not have files named with these characters in your file system – if you do, you can adapt the script accordingly)
  • They are 128 characters in length or over
  • They start with a period character
  • They end with a period character
  • They contain consecutive period characters

There is further information available on this criteria here: http://www.thesug.org/mossasaurus/Wiki%20Pages/SharePoint%20Invalid%20Characters.aspx.

The PowerShell script in this article allows you to scan an entire folder structure, including subfolders, and report on all files and folders containing one or more of the conditions listed above. There are also options within the script to automatically rename illegal characters in file names with something acceptable to SharePoint – for example, renaming the & symbol with the word ‘and’.

To use the script, first load the following function in a PowerShell console. Note that loading the function will not actually do anything until you call it later from the command line:

function Check-IllegalCharacters ($Path, [switch]$Fix, [switch]$Verbose)
{
    Write-Host Checking files in $Path, please wait...
    #Get all files and folders under the path specified
    $items = Get-ChildItem -Path $Path -Recurse
    foreach ($item in $items)
    {
        #Check if the item is a file or a folder
        if ($item.PSIsContainer) { $type = "Folder" }
        else { $type = "File" }
       
        #Report item has been found if verbose mode is selected
        if ($Verbose) { Write-Host Found a $type called $item.FullName }
       
        #Check if item name is 128 characters or more in length
        if ($item.Name.Length -gt 127)
        {
            Write-Host $type $item.Name is 128 characters or over and will need to be truncated -ForegroundColor Red
        }
        else
        {
            #Got this from
http://powershell.com/cs/blogs/tips/archive/2011/05/20/finding-multiple-regex-matches.aspx
            $illegalChars = '[&{}~#%]'
            filter Matches($illegalChars)
            {
                $item.Name | Select-String -AllMatches $illegalChars |
                Select-Object -ExpandProperty Matches
                Select-Object -ExpandProperty Values
            }
           
            #Replace illegal characters with legal characters where found
            $newFileName = $item.Name
            Matches $illegalChars | ForEach-Object {
                Write-Host $type $item.FullName has the illegal character $_.Value -ForegroundColor Red
                #These characters may be used on the file system but not SharePoint
                if ($_.Value -match "&") { $newFileName = ($newFileName -replace "&", "and") }
                if ($_.Value -match "{") { $newFileName = ($newFileName -replace "{", "(") }
                if ($_.Value -match "}") { $newFileName = ($newFileName -replace "}", ")") }
                if ($_.Value -match "~") { $newFileName = ($newFileName -replace "~", "-") }
                if ($_.Value -match "#") { $newFileName = ($newFileName -replace "#", "") }
                if ($_.Value -match "%") { $newFileName = ($newFileName -replace "%", "") }
            }
           
            #Check for start, end and double periods
            if ($newFileName.StartsWith(".")) { Write-Host $type $item.FullName starts with a period -ForegroundColor red }
            while ($newFileName.StartsWith(".")) { $newFileName = $newFileName.TrimStart(".") }
            if ($newFileName.EndsWith(".")) { Write-Host $type $item.FullName ends with a period -ForegroundColor Red }
            while ($newFileName.EndsWith("."))   { $newFileName = $newFileName.TrimEnd(".") }
            if ($newFileName.Contains("..")) { Write-Host $type $item.FullName contains double periods -ForegroundColor red }
            while ($newFileName.Contains(".."))  { $newFileName = $newFileName.Replace("..", ".") }
           
            #Fix file and folder names if found and the Fix switch is specified
            if (($newFileName -ne $item.Name) -and ($Fix))
            {
                Rename-Item $item.FullName -NewName ($newFileName)
                Write-Host $type $item.Name has been changed to $newFileName -ForegroundColor Blue
            }
        }
    }
}

As commented in the script, note that I have used a code snippet on the PowerShell.com blog here to find multiple regular expression matches in the file and folder names.

Once loaded, you can call the script using the following commands as examples:

Check-IllegalCharacters -Path C:\Files

The command above will check the folder path specified but will only report file and folder names detected with illegal characters or length.

Check-IllegalCharacters -Path C:\Files -Verbose

This command will also only report files and folder names detected with illegal characters or length, but this time it will also tell you names of the files and folders it has checked in the process. This can be used to make sure the script is checking all the locations you are expecting it to.

Check-IllegalCharacters -Path C:\Files -Fix

The command here will not only check file and folder names for illegal characters, but will also fix them using the rules specified in the script. You can customise these rules as you see fit, but I have gone with the following criteria:

  • Do not change files and folders with names of 128 characters or over (i.e., manually truncate them)
  • Replace two or more consecutive periods in a file or folder name with a single period
  • If the file or folder name either starts or finishes with a period, remove it
  • File or folder names containing illegal characters are processed as follows:
    • Replace ‘&’ with ‘and’
    • Replace ‘{‘ with ‘(‘
    • Replace ‘}’ with ‘)’
    • Replace “~” with “-“
    • Remove the ‘#’ character
    • Remove the ‘%’ character

An example running the script on some files and folders containing deliberately illegal characters is shown below:

The Illegal Files

The following screenshot shows the output from running the script:

image

And evidence that the files were renamed successfully…

The Proof