Unicode文字列

2026年3月13日

1 分

Unicode のマルチバイト文字列も扱える。

julia> "こんにちは" * "Julia"
"こんにちはJulia"

julia> "こんにちは"[1]
'こ': Unicode U+3053 (category Lo: Letter, other)

ただし、インデックスで参照する場合には注意が必要だ。

julia> s = "こんにちは"
"こんにちは"

julia> s[1]
'こ': Unicode U+3053 (category Lo: Letter, other)

julia> s[2]
ERROR: StringIndexError: invalid index [2], valid nearby indices [1]=>'こ', [4]=>'ん'
Stacktrace:
 [1] string_index_err(s::AbstractString, i::Int64)
   @ Base .\strings\string.jl:12
 [2] getindex_continued(s::String, i::Int64, u::UInt32)
   @ Base .\strings\string.jl:473
 [3] getindex(s::String, i::Int64)
   @ Base .\strings\string.jl:465
 [4] top-level scope

s[2] がエラーになるのは、1文字目がマルチバイト文字なのに、2バイト目を参照するから。この場合1文字目がは3バイトなので s[4] とすれば期待した結果が得られる。

julia> s[4]
'ん': Unicode U+3093 (category Lo: Letter, other)

とはいえ、これでは不便すぎるので、「次の文字のインデックス」を返す nextind 関数がある。

julia> nextind(s, 1)
4

julia> nextind(s, 4)
7

これで、2文字目のインデックスは 4、その次は 7 だとわかる。直接使うならこうすればいい。

julia> s[nextind(s, 1)]
'ん': Unicode U+3093 (category Lo: Letter, other)

julia> s[nextind(s, 4)]
'に': Unicode U+306B (category Lo: Letter, other)

正規表現

文字と文字列