Wednesday, 2 November 2011

Use PowerShell to check for illegal characters before uploading multiple files into SharePoint

If you have done any sort of bulk file uploading into SharePoint, you will be aware of issues with file names containing illegal characters. These files can disrupt the uploading process, potentially causing many hours of frustrating and time consuming tasks examining and repairing file names.

Files and folders are blocked by SharePoint during the uploading process for the following reasons:

  • They contain the following characters: & { } ~ # % (there are other illegal characters too, but as they are also blocked from use in Windows Explorer, it is assumed you will not have files named with these characters in your file system – if you do, you can adapt the script accordingly)
  • They are 128 characters in length or over
  • They start with a period character
  • They end with a period character
  • They contain consecutive period characters

There is further information available on this criteria here: http://www.thesug.org/mossasaurus/Wiki%20Pages/SharePoint%20Invalid%20Characters.aspx.

The PowerShell script in this article allows you to scan an entire folder structure, including subfolders, and report on all files and folders containing one or more of the conditions listed above. There are also options within the script to automatically rename illegal characters in file names with something acceptable to SharePoint – for example, renaming the & symbol with the word ‘and’.

To use the script, first load the following function in a PowerShell console. Note that loading the function will not actually do anything until you call it later from the command line:

function Check-IllegalCharacters ($Path, [switch]$Fix, [switch]$Verbose)
{
    Write-Host Checking files in $Path, please wait...
    #Get all files and folders under the path specified
    $items = Get-ChildItem -Path $Path -Recurse
    foreach ($item in $items)
    {
        #Check if the item is a file or a folder
        if ($item.PSIsContainer) { $type = "Folder" }
        else { $type = "File" }
       
        #Report item has been found if verbose mode is selected
        if ($Verbose) { Write-Host Found a $type called $item.FullName }
       
        #Check if item name is 128 characters or more in length
        if ($item.Name.Length -gt 127)
        {
            Write-Host $type $item.Name is 128 characters or over and will need to be truncated -ForegroundColor Red
        }
        else
        {
            #Got this from
http://powershell.com/cs/blogs/tips/archive/2011/05/20/finding-multiple-regex-matches.aspx
            $illegalChars = '[&{}~#%]'
            filter Matches($illegalChars)
            {
                $item.Name | Select-String -AllMatches $illegalChars |
                Select-Object -ExpandProperty Matches
                Select-Object -ExpandProperty Values
            }
           
            #Replace illegal characters with legal characters where found
            $newFileName = $item.Name
            Matches $illegalChars | ForEach-Object {
                Write-Host $type $item.FullName has the illegal character $_.Value -ForegroundColor Red
                #These characters may be used on the file system but not SharePoint
                if ($_.Value -match "&") { $newFileName = ($newFileName -replace "&", "and") }
                if ($_.Value -match "{") { $newFileName = ($newFileName -replace "{", "(") }
                if ($_.Value -match "}") { $newFileName = ($newFileName -replace "}", ")") }
                if ($_.Value -match "~") { $newFileName = ($newFileName -replace "~", "-") }
                if ($_.Value -match "#") { $newFileName = ($newFileName -replace "#", "") }
                if ($_.Value -match "%") { $newFileName = ($newFileName -replace "%", "") }
            }
           
            #Check for start, end and double periods
            if ($newFileName.StartsWith(".")) { Write-Host $type $item.FullName starts with a period -ForegroundColor red }
            while ($newFileName.StartsWith(".")) { $newFileName = $newFileName.TrimStart(".") }
            if ($newFileName.EndsWith(".")) { Write-Host $type $item.FullName ends with a period -ForegroundColor Red }
            while ($newFileName.EndsWith("."))   { $newFileName = $newFileName.TrimEnd(".") }
            if ($newFileName.Contains("..")) { Write-Host $type $item.FullName contains double periods -ForegroundColor red }
            while ($newFileName.Contains(".."))  { $newFileName = $newFileName.Replace("..", ".") }
           
            #Fix file and folder names if found and the Fix switch is specified
            if (($newFileName -ne $item.Name) -and ($Fix))
            {
                Rename-Item $item.FullName -NewName ($newFileName)
                Write-Host $type $item.Name has been changed to $newFileName -ForegroundColor Blue
            }
        }
    }
}

As commented in the script, note that I have used a code snippet on the PowerShell.com blog here to find multiple regular expression matches in the file and folder names.

Once loaded, you can call the script using the following commands as examples:

Check-IllegalCharacters -Path C:\Files

The command above will check the folder path specified but will only report file and folder names detected with illegal characters or length.

Check-IllegalCharacters -Path C:\Files -Verbose

This command will also only report files and folder names detected with illegal characters or length, but this time it will also tell you names of the files and folders it has checked in the process. This can be used to make sure the script is checking all the locations you are expecting it to.

Check-IllegalCharacters -Path C:\Files -Fix

The command here will not only check file and folder names for illegal characters, but will also fix them using the rules specified in the script. You can customise these rules as you see fit, but I have gone with the following criteria:

  • Do not change files and folders with names of 128 characters or over (i.e., manually truncate them)
  • Replace two or more consecutive periods in a file or folder name with a single period
  • If the file or folder name either starts or finishes with a period, remove it
  • File or folder names containing illegal characters are processed as follows:
    • Replace ‘&’ with ‘and’
    • Replace ‘{‘ with ‘(‘
    • Replace ‘}’ with ‘)’
    • Replace “~” with “-“
    • Remove the ‘#’ character
    • Remove the ‘%’ character

An example running the script on some files and folders containing deliberately illegal characters is shown below:

The Illegal Files

The following screenshot shows the output from running the script:

image

And evidence that the files were renamed successfully…

The Proof

35 comments:

  1. There are so many times in the past I needed this.

    ReplyDelete
  2. I think this might have been useful but it's too hard to read your light print on black background.

    ReplyDelete
  3. Oh well, there are plenty of other sites giving away all these scripts for free that I'm sure you can read instead - Enjoy!!

    ReplyDelete
    Replies
    1. You are awesome Phil, just saved me hours of coding!

      Delete
    2. Hi Phil,

      I have a project with asp classic and now i want to migrate it into sharepoint 2010.
      Could You please help me into it or provide any solution so taht i can implement.
      Thank You In Advance.

      Delete
  4. Excellent utility. Many thanks.

    ReplyDelete
  5. That is a fantastic tool, thank you so much you saved hours of miserable life

    ReplyDelete
  6. Nice script. Thankyou!

    ReplyDelete
  7. Hi Phil, Thanks for posting...I would like to try this, but have you come across a situation where the users have used illegal characters in attachments? If so, how could we alter your script to access the attachments? The list that I'm trying to fix is an issue list. I would be grateful for any pointers.

    ReplyDelete
  8. When I run the above Commands I just get "The term is not recognized..."

    Any ideas?

    ReplyDelete
    Replies
    1. This is an awesome script but if the folder name has an illegal character and is fixed then the script fails checking the files within the folder because the folder is now a different name.

      Delete
  9. Any way to get this script to replace * or : ? Trying to modify Mac files.

    ReplyDelete
  10. The script does nothing..?

    ReplyDelete
  11. Made a couple of improvements. Added `r and try running with
    start-transcript -path c:\log.txt
    You can then get a list and look through it before running -fix command.

    I am not that good with powershell. Maybe someone can make that a switch.


    function Check-IllegalCharacters ($Path, [switch]$Fix, [switch]$Verbose)
    {
    Write-Host Checking files in $Path, please wait...`r

    #Get all files and folders under the path specified
    $items = Get-ChildItem -Path $Path -Recurse
    foreach ($item in $items)
    {
    #Check if the item is a file or a folder
    if ($item.PSIsContainer) { $type = "Folder" }
    else { $type = "File" }

    #Report item has been found if verbose mode is selected
    if ($Verbose) { Write-Host Found a $type called $item.FullName`r }

    #Check if item name is 128 characters or more in length
    if ($item.Name.Length -gt 127)
    {
    Write-Host $type $item.Name is 128 characters or over and will need to be truncated `r

    }
    else
    {
    #Got this from http://powershell.com/cs/blogs/tips/archive/2011/05/20/finding-multiple-regex-matches.aspx
    $illegalChars = '[&{}~#%]'
    filter Matches($illegalChars)
    {
    $item.Name | Select-String -AllMatches $illegalChars |
    Select-Object -ExpandProperty Matches
    Select-Object -ExpandProperty Values
    }

    #Replace illegal characters with legal characters where found
    $newFileName = $item.Name
    Matches $illegalChars | ForEach-Object {
    Write-Host $type $item.FullName has the illegal character $_.Value `r
    #These characters may be used on the file system but not SharePoint
    if ($_.Value -match "&") { $newFileName = ($newFileName -replace "&", "and") }
    if ($_.Value -match "{") { $newFileName = ($newFileName -replace "{", "(") }
    if ($_.Value -match "}") { $newFileName = ($newFileName -replace "}", ")") }
    if ($_.Value -match "~") { $newFileName = ($newFileName -replace "~", "-") }
    if ($_.Value -match "#") { $newFileName = ($newFileName -replace "#", "") }
    if ($_.Value -match "%") { $newFileName = ($newFileName -replace "%", "") }
    }

    #Check for start, end and double periods
    if ($newFileName.StartsWith(".")) { Write-Host $type $item.FullName starts with a period `r }
    while ($newFileName.StartsWith(".")) { $newFileName = $newFileName.TrimStart(".") }
    if ($newFileName.EndsWith(".")) { Write-Host $type $item.FullName ends with a period `r }
    while ($newFileName.EndsWith(".")) { $newFileName = $newFileName.TrimEnd(".") }
    if ($newFileName.Contains("..")) { Write-Host $type $item.FullName contains double periods `r }
    while ($newFileName.Contains("..")) { $newFileName = $newFileName.Replace("..", ".") }

    #Fix file and folder names if found and the Fix switch is specified
    if (($newFileName -ne $item.Name) -and ($Fix))
    {
    Rename-Item $item.FullName -NewName ($newFileName)
    Write-Host $type $item.Name has been changed to $newFileName -ForegroundColor Blue `r
    }
    }
    }
    }

    start-transcript -path c:\log.txt

    Check-IllegalCharacters -Path C:\temp

    ReplyDelete
  12. How would you redirect the output of this to a log file for review?

    ReplyDelete
    Replies
    1. Hi,
      I have a project with asp classic and now i want to migate it into sharepoint 2010.
      Will any one help me how can i perform this task.
      Thank You In Advance.

      Delete
  13. Just made my day man! Awesome post. Thanks to Bingle.nu for leading me here!

    ReplyDelete
  14. Awesome post, thanks.

    I added a "-Force" to the get-childitem line (line 5) to make this work against hidden files also.

    I also ran into the problem with the folder name being modified before the file, but I simply re-ran the script a few times to make sure it was all clear... not too much of an issue really.

    For those asking how to dump the output to a text file... just run it with ">C:\output_folder\output_file.txt" at the end, just as you would in a command prompt.

    ReplyDelete
  15. Very useful post! Many thanks! :-)

    ReplyDelete
  16. Thanks for the post, works great!

    ReplyDelete
  17. You sir...have saved my migration pains. Thank you!

    ReplyDelete
  18. So useful - thank you! =D

    ReplyDelete
  19. I'v modified the below line so child items are renamed before parents.
    $items = Get-ChildItem -Recurse -Path $Path | Sort -Descending FullName

    ReplyDelete
  20. Awesome! Thanks!

    FYI - OneDrive for Business on Sharepoint Online now supports additional characters! &~{}

    Ref: https://support.office.com/en-nz/article/Invalid-characters-in-file-or-folder-names-or-invalid-file-types-in-OneDrive-for-Business-64883a5d-228e-48f5-b3d2-eb39e07630fa

    ReplyDelete
  21. Insanely helpful, thank you!

    ReplyDelete
  22. Thank you! This worked perfectly. After looking at many other scripts online, this was the one that worked best for my purposes.

    ReplyDelete
  23. This seems perfect for what I am looking for which is that I would like to be able to remove illegal characters from file/folder names in One Drive for business stores. My apologies as I am not well-versed with powershell, but is it possible to set this up to run automatically and if so can you explain how?

    ReplyDelete
  24. الاول خدماتها تغطى جميع انحاء المملكة فهى افضل شركات التنظيف بجدة ومكة والرياض وينبع والاحساء والدمام نتميز باننا نوفر افضل العماله المدربة الماهرة نقدم تنظيف منازل وخزانات وبيوت وفلل وشقق ومجالس وسجاد وموكيت
    شركة تنظيف منازل بجدة
    افضل شركة تنظيف بالدمام
    شركات نقل اثاث بينبع
    شركة تنظيف شقق بينبع


    شركة تنظيف خزانات بمكة
    شركة تنظيف خزانات بالقطيف
    شركة تنظيف منازل بالاحساء
    شركة شراء اثاث مستعمل بالرياض
    شركة نقل عفش بجدة
    افضل شركة تنظيف الكنب الرياض
    - الفئران
    تعد من اكثر القوارض خطورة على الانسان لانها ناقلة لامراض خطيرة كالسل والطاعون ولابد من مكافحتها على يد متخصص فهى تجيد الاختباء وتتكاثر بسرعة عاليةوتنتشر في شبكات الصرف الصحى ويتم مكافحتها بعدة طرق بافضل مبيدات وطرق مكافحة الفئران فالاول تعرف بانها افضل شركة مكافحة الفئران بالرياض وافضل شركة مكافحة الفئران بجدة وافضل شركة مكافحة الفئران بمكة وافضل شركة مكافحة الفئران بالخبر وافضل شركة مكافحة الفئران بالقصيم وافضل شركة مكافحة الفئران بالمدينة المنورة
    افضل شركة رش مبيدات بالجبيل - افضل شركة رش مبيدات بالخرج


    ReplyDelete
  25. You can use Long Path Tool to resolve the issues.

    ReplyDelete
  26. Hi Phil...

    I found your site and this script... its exactly what I was looking for!!
    I do have a question for you. I tested the script on a several different user files, and it seemed to work. But for some reason it isn't renaming or correcting all of the files. The odd thing is, for example, it says found "double period" and states that it is renaming it, but when I check the file name, it still contains a "double period".
    Any idea why this would be happening?
    I ran PS as administrator, but still no luck.
    Thanks in advance for your help and awesome script!

    ReplyDelete
    Replies
    1. Oh, I also want to add, that I am also receiving the error "Rename-Item : Cannot rename because item at '\\UNC\path\example\document.txt' does not exist.
      Again, it is odd that some files were corrected, but others are cannot be located even thou it is seen as having an invalid character.

      Delete
  27. Wonderful blog! I found it while searching on Yahoo News. Do you have any suggestions on how to get listed in Yahoo News? I’ve been trying for a while but I never seem to get there! Many thanks.
    2048 game | five nights at freddy's 4 | five nights at freddy's 3 | fireboy and watergirl | fireboy and watergirl 4||red ball | age of war

    ReplyDelete